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出版者的话 


在我国已经加入 WTO 、 经济全球化的今天，为适应当前我国高校各类创 
新人才培养的需要，大力推进教育部倡导的双语教学，配合教育部实施的“高 
等学校教学质量与教学改革工程”和“精品课程”建设的需要，高等教育出版 
社有计划、大规模地开展了海外优秀数学类系列教枯的引进工作。 

高等教肓出版社和 Pearson Education , John Wiley & Sons , McGraw - Hill , 
Thomson Learning 等国外出版公司进行了广泛接触，经囯外出版公司的推荐并 
在国内专家的协助下，提交引进版权总数100余种。收到样书后，我们聘请了 
国内高校一线教师、专家、学者参与这些原版教材的评介工作，并参考国内相 
关专业的课程设置和教学实际情况，从中遴选出了这套优秀教材姐织出版。 

这批教材普遍具有以下特点： （1) 基本上是近3年出版的，在国际上被广 
泛使用，在同类教材中具有相当的权威性； （2) 高版次，历经多年教学实践检 
验，内容翔实准确、反映时代 要求； （3) 各种教学资源配套整齐，为师生提供 
了极大的 便利； （4) 插图精美、丰富，图文并茂/与正文相辅 相成； （5) 语言 
简练、流畅、可读性强，比较适合非英语国家的学生阅读。 

本系列丛书中，有 Hnney 、" Weir 等编的《托马斯微积分》（第10版， 
Pearson ) ,其特色可用“呈传统特色、富革新精神”概括，本书自20世纪50 

6 b W 

年代第1版以来，平均每四五年就有一个新版面世，长达50佘年始终盛行于 
西方教坛，作者既有相当高的学术水平，又热爱教学，长期工作在教学第一 
线，其中，年近90的 G . B . Thomas 教授长年在 MIT 工作，具有丰富的教学经 
验； Finney 教授也在 MIT 工作达10 年； Weir 是美国数学建模竞赛委员会主 
任 cj Stewart 编的立体化教材《微积分》（第5版 ， Thomson Learning ) 配备了 
丰富的教学资源，是囯际上最畅销的微积分原版教材，2003年全球销量约40 
佘万册，在美国，占据了约50%〜*60%的微积分教材市场，其用户包括耶鲁 
等名牌院校及众多一般院校600余所。本系列丛书还包括 Anton 编的经典教材 
《线性代数及其应用》（第8版， Wiley ) ； JayL . Devore 编的优秀教材《概率论 
与数理统计》（第5版 ， Thomson Learning ) 等。在努力降低引进教材售价方 
面，高等教肓出版社做了大量和细致的工作，这套引进的教材体现了一定的权 
威性、系统性、先进性和经济性等特点。 

通过影印、翻译、编译这批优秀教材，我们一方面要不断地分析、学习、 
消化吸收国外优秀教材的长处，吸取囯外出版公司的制作兹验，提升我们自编 














教材的立体化配套标准，使我国高校教材建设水平上一个新的 台阶； 与此同 
时，我们还将尝试组织海外作者和国内作者合编外文版基础课数学教材，并约 
请国内专家改编部分国外优秀教材，以适应我国实际教学环境。 

这套教材出版后，我们将结合各高校的双语教学计划，开展大规模的宣 
传、培训工作，及时地将本套丛书推荐给高校使用。在使用过程中，我们衷心 
希望广大高校教师和同学提出宝贵的意见和建议，如有好的教材值得引进，请 
与高等教育出版社高等理科分社联系。 

联系电话： 010-58581384 ， E - mail : xuke@hep • com ■ cn 。 


高等教育出版社 
2004年4月20日 
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. 私喊威與 

M _ 

When Allen T. Craig died in late November 1978, 1 lost my advisor, 
mentor, colleague, and very dear friend. Due tq/liis health, Allen did 
nothing on the fourth edition and, of course, this revision is mine alone. 
There is, however, a great deal of Craig’s influence in this book. As a 
matter of fact, when I would debate. Mth myself whether or not to 
change something, I could hear Allen saying, “It’s very good now ， Bob; 






mess it up.” Often, I would follow that advice. 



evertheless^ there were a number of things that needed to be done. 


I have had many suggestions from my colleagues at the University of 
Iowa; in particular, Jim Broffitt，Jon Cryer, Dick Dykstra，Subhash 
Kochar (a visitor), Joe Lang，Russ Lenth，and Tim Robertson 
provided me with a great deal of constructive criticism. In addition, 
(髮乎 th ree ^viewers suggested a number of other topics to include. I have 

f 5^ also had statisticians and students from around the world write to me 

about possible improvements. Elliot Tanis，my good friend and 
co-author of our Probability and Statistical Inference, gave me 
permission to use a few of the figures, examples, and exercises used in 
that book. I truly thank these people，who have been so helpful and 
generous. 


Clearly，I could not use all of these ideas* As a matter of fact，I 
resisted adding “real” problems, although a few slipped into the 
exercises. Allen and I wanted to write about the mathematics of 
statistics, and I have followed that guideline. Hopefully, without those 
problems，there is still enough motivation to study mathematical 
statistics in this book. In addition, there are a number of excellent 
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Preface 


books on 


applied 1 statistics, and most students have had a little 


exposure to applications before studying this book. 

The major differences between this edition and the preceding one 
are the following; 


• There is a better discussion of assigning probabilities to events, 

including introducing independent events and Bayes’ theorem in the 
text 


m The consideration of random variables and their expectations is 
greatly improved. 

• Sufficient statistics are presented earlier (as was true in the very 

early editions of the book), and minima] sufficient statistics are ， 
introduced, suffia&d s-t^dtis^ics 範分访 4 重 

• of the maximxim likelihood estimators and invariant 
location- and scale-statistics are considered. 




The expressions “convergence in distribution” and ‘土四從李綠 
^robaby^t^are used, and the delta method for finding asymptotic f 
distributions is spelled out. spU 身辨蜎良辟;屬楚 ^ diptft 

Fisher information is dven. and the Rao-Cramer tower hmind k 痛 ^ 命 ^ 




_ Rao-Cramer |p 3 ^boyjtdis 

presented for an estimator of a function of a parameter，not just fof^^ 
an unbiased estimator. 

The asymptotic distribution of the maximum likelihood estimator 




is included. 

The discussion of Bayesian procedures has been improved and 
expanded somewhat, , 老法 


各柃 ::靖 


^ ■* 

There are also a number of little items that should improve the 
understanding of the text: the expressions var and cov are used; the 

is in the text; there is more explanation of 
^values; the relationship bettwo-sided tests anA ^ 沿 

jiltervais is noted; the indica^Q^tion is used when » 


oi"multivariate normal distribution is given earlier (for those with an 
n;K ^ 今 、 appropriate background in matrices, although this is still not necessary 
f ' ^ the use of this book); and there is more on conditioning. 

I believe that the order of presentation has been improved; in 
particular, sufficient statistics are presented earlier. More exercises 
have been introduced; and at the end of each chapter，there are several 


additional exercises that have not been ordered by section or by 
difficulty (several students had suggested this). Moreover, answers 
have not been given for any of these additional exercises because I 
thought some instructors might want to use them for questions on 
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_ * 〆 

examinations* Finally, the index has been improved greatly, another 
suggestion of students as well as of some of my colleagues at Iowa, 

There is really enough material in this book for a three-semester 
sequence. However, most instructors find that selections from the first 必％ 以二纪 
five chapters provide a good one-semester background in the 
probability needed for the mathematical statistics based on selections 
from the remainder of the book, which certainly would include most 
of Chapters 6 and 7. 

lam obligated to Catherine M. Thompson and Maxine Merrington 
and to Professor E. S. Pearson for permission to include Tables II and 
V， which are abridgments and adaptations of tables published in 
Biometrika. I wish to thank Oliver & Boyd Ltd” Edinburgh, for 
permission to include Table IV， which is an abridgment and adaptation 
of Table III from the book Statistical Tables for Biological, 
Agricultural, and Medical Research by the late Professor Sir Ronald A. 

Fisher, Cambridge, and Dr. Frank Yates, Rothanisted* 

Finally, I would like to dedicate this edition to the memory of Allen 

Craig and my wife ， Carolyn, who died June 25, 1990. Without the love 

and support of these two caring persons’ I could not have done as much 

professionally as I have. My friends in Iowa City and my children 

(Mary ， Barbara, Allen, and Robert) have given me the strength to 

continue. After four previous efforts, I really hope that I have come 

dose to “getting it right this fifth tim^r l will let the readers be the 
judge. 



R. V. H. 


% 
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CHAPTER 




Probability and 
Distributions 


1*1 Introduction 


L'0 




tM 


Many kinds of investigations may be characterized in part by ^ m 
the fact that repeated experimentation, under essentially the same 。 

conditions, is more or less standard procedure. For instance，in medical 
research, interest may center on the effect of a drug that is to be A 相钕如分 
administered; or an economist may be concerned with the prices tj , 1 以 

three specified commodities at various time intervals; or the 多好蘇 
迎 may wish to study the effect that a chemical jfertilizer has 
on the yield of a cereal grain. The only way in which an investigator C 
can elicit information about any such phenomenon is to perform his 
experiment* Each experiment t^iminates with an outcome. But it 二 
is characteristic of these experiments that the outcome cannot be 
predicted 度加弘故 the performance of the experiment. 

Suppose that we have such an experiment, the outcome of which 
cannot be predicted with certainty，but the experiment is of such a 
nature that a collection of every possible outcome can be described 
prior to its performance. If this kind of experiment can be repeated 
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Probability and Distributians [Ch. 1 


under the same conditions, it is called a random experiment, and the 
collection of every possible outcome is called the experimental space 
or the sample space • 

Example L In the toss of a coin，let the outcome tails be denoted by T and 
let the outcome heads be denoted by H. If we assume that the coin may be 
repeatedly tossed under the same conditions, then the toss of this coin is an 
example of a random experiment in which the outcome is one of the two 
symbols T and H; that is, the sample space is the collection of these two 
symbols. 

Example 2, In the cast of one red die and one white die, let the outcome 
be the ordered pair (number of spots up on the red die, number of spots up 
on the white die). If we assume that these two dice may be repeatedly cast 
under the same conditions, then the cast of this pair of dice is a random 
experiment and the sample space consists of the following 36 ordered pairs: 



Let 较 denote a sample space, and let C represent a part of 锣 .If ， 
upon the performance of the experiment, the outcome is in C, we shall 
say that the event C has occurred* Now conceive of our having made 
N repeated performances of the random experiment. Then we can 
count the number /of times (the frequency) that the event C actually 
occurred throughout the N performances. The ratio//TV is called the 
relative frequency of the event C in these N experiments. A relative 
frequency is usually quite erratic for small values of N\ as you can 
discover by tossing a coin. But as increases, experience indicates that 
we associate with the event C a number, say p, that is equal or 
approximately equal to that number about which the relative 
frequency seems to stabilize. If we do this，then the number p can be 
interpreted as that number which，in future performances of the 
experiment, the relative frequency of the event C will either equal or 
approximate. Thus, although we cannot predict the outcome of a 
random experiment, we can, for a large value of N, predict 
approximately the relative frequency with which the outcome will be 
in C The number p associated with the event Cis given various names. 
Sometimes it is called the probability that the outcome of the random 
experiment is in C; sometimes it is called the probability of the event 
C; and sometimes it is called the probability measure of C, The context 


usually suggests an appropriate choice 一 … : 一 1 — ， 



Example 3. Let 贫 denote the sample space of Example 2 and let C be the 
collection of every ordered pair of ^ for which the sum of the pair is 
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equal to seven. Thus C is the collection (1 ， 6) ， (2, 5) ， (3,4) ， (4, 3), (5, 2)，and 

(6,1)，Suppose that the dice are cast W = 400 times and let the frequency 

of a sum of seven, be/= 60_ Then the relative frequency with which the 

outcome was in C is f/N - ^ = 0]5* Thus we might associate with C a 

number^ that is close to 0-15, and p would be called the probability of the 
event C 

a ' w ■ ■*» i 

1 

Remark. The preceding interpretation of probability is sometimes 
referred to as the relative frequency approach, and it obviously depends upon 
the fact that an experiment can be repeated under essentially identical 
conditions. However, many persons extend probability to other situations by 
treating it as a rational measure of belief. For example, the statement 尸 =| 
would mean to them that their personal or subjective probability of the event 
C is equal to Hence, if they are not opposed to gambling, this, could be 
interpreted as a willingness on their part to bet on the outcome of C so that 
the two possible payoffs are in the ratio /?/(l — = §/| = |, Moreover, if they 

truly believe that p — | is correct，they would be willing to accept either side 
of the bet: (a) win 3 units if C occurs and lose 2 if it does not occur, or (b) 
win 2 units if C does not occur and lose 3 if it does. However, since the 
mathematical properties of probability given in Section 1.3 are consistent with 
either of these interpretations, the subsequent mathematical development 
does not depend upon which approach is used* 

The primary purpose of having a mathematical theory of statistics 
is to provide mathematical models for random experiments. Once a 
model for such an experiment has been provided and the theory worked 
put in detail, the statistician may, within this framework, make 
inferences (that is, draw conclusions) about the random experiment. 
The construction of such a model requires a theory of probability. 
One of the more logically satisfying theories of probability is that 
based on the concepts of sets and functions of sets. These concepts 
are introduced in Section 1.2, 


1*2 Set Theory 

The concept of a set or a collection of objects is usually left 
undefined. However, a particular set can be described so that there is 
no misunderstanding as to what collection of objects is under * 
consideration- For example, the set of the first 10 positive integers is 
sufficiently well described to make clear that the numbers | and 14 are 
not in the set, while the number 3 is in the set. If an object belongs to 
a set，it is said to be an element of the set. For example, if A denotes 
the set of real numbers x for which 0 < jc< 1, then ! is an element of 
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the set A. The fact that \ is an element of the set A is indicated by 
writing A. More generally, ae A means that a is an element of the 
set A. 

The sets that concern us will frequently be sets of numbers. 
However，the language of sets of points proves somewhat more 
convenient than that of sets of numbers. Accordingly, we briefly in¬ 
dicate how we use this terminology. In analytic geometry consider¬ 
able emphasis is placed on the fact that to each point on a line (on which 
an origin and a unit point have been selected) there corresponds one 
and only one number，say x; and that to each number x there 
corresponds one and only one point on the line. This one-to-one 
correspondence between the numbers and points on a line enables us 
to speak, without misunderstanding, of the “point x’，instead of the 
number x” Furthermore, with a plane rectangular coordinate system 
and with jc and 少 numbers，to each symbol (x, y) there corresponds one 
and only one point in the plane; and to each point in the plane there 
corresponds but one such symbol- Here again, we may speak of the 
“point (jc ， y ):’ meaning the “ordered number pair x and j •” This 
convenient language can be used when we have a rectangular 
coordinate system in a space of three or more dimensions* Thus the 
“point •… ， x n y means the numbers in the order 

stated. Accordingly, in describing our sets, we frequently speak of a set 
of points (a set whose elements are points), being careful, of course，to 
describe the set so as to avoid any The notation^^ 

^ ^ [x : 0 < x < 1} is read is the one-dimerisioiml set of pomts x 
for which 0 <x < L" Similarly, A = {(x, y):0 <x < 1,0 < 
y < 1} can be read is the two-dimensional set of points (x, y) that 
arejnterior to, or on the boundary of, a square with opposite vertices 
at (OT^T and (1, 1),” We now give some definitions (together with 
illustrative examples) that lead to an 三 

adequate for our purposes* — 一一 ^ 

Definition 1. If each element of a set A x is also an dement of set 為， 
the set A { is called a suhsei of the set A 2 ^ This is indicated by writing 

cz A 2 . If cz A 2 and also A 2 ^ A u the two sets have the same 
elements, and this is indicated by writing A x ^ A 2 , 

Example L Let = {x : 0 < x < 1} and A 2 — {x: — 1 < a: < 2}* Here 
the one-dimensional set A } is seen to be a subset of the one-dimensional set 
A 2 ; that is, A , [ A 2 . Subsequently, when the dimensionality of the set is clear, 
we shall not make specific reference to it. 
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Example Z Let A } ^ {(x, y) :0 < x = y < 1} and A 2 = {(x 7 y ); 

0 <x < 1,0 < y < 1}, Since the elements of A { are the points on one diagonal 
of the sq^re, then A y a A z . 糊咖 0] 

Definition 2* If a set A has no elements, A is called the null set. This 
is indicated by writing A = 0. 


Definition 3. The set of all elements that belong to at least one of 
the sets A , and A 2 is called the union of A x and A 2 , The union of A { and 
A 2 is indicated by writing A x u A 2 . The union of several sets 
A ly A 3 , … is the set of all elements that belong to at least one of 
the several sets. This union is denoted by A] u A 2 u A 3 u * - * or by 
Ai u A 2 ^j * * t u A k if a finite number k of sets is involved. 



Example 3. Let = {x :; c = 0, 1, ■ • •, 10} and A 2 = {x:x=% J 9, 10,11, 
or II <x< 12}. Then A,u A 2 = {x : x ^ 0 7 ” 8, 9, 10,11 ， or 11 < 

x ^ 12} = {d = 0, 1， _ • ，， 8, 9, 10, or 11 < x < 12}, 

Example^ hctAx and A 2 be defined as in Example 1. Then A x u A 2 ^ A 2 * 

Example 5. Let A 2 ― 0. Then A t \j A 2 — A t for every set 為， 

k 

Example 6. For every set A,AuA = A, 

Example 7* Let 


A ： + 1 

Then A 1 u A 2 u A 3 ^j — = {x:0 < x< INote that the number zero is not 
in this set, since it is not in one of the sets A u A 2i , 

4 

Definition 4. The set of all elements that belong to each of the sets 
A t and A % is 似 lied the intersection of A } and A ? . The intersection of 
為 and A 2 is indicated by writing A x n A 2 , The intersection of several 
sets A u A 2f , is the set of all elements that belong to each of the 

sets A I, A 2 ,A 3 , - This intersection is denoted by A l nA 2 nA 3 n — • 

or by A { nA 2 n — * nA k if a finite number k of sets is involved • 

Example S. Let A x - {(0, 0) s (0, I), (1, 0} and A 2 = {(1,1), (1, 2), (2 f I)). 
ThenA t nA 2 = {(U 1)}- 

Example % Let A t — {(x, y):0^x + y<l} and A 2 - {(x^y ) : 1 < 
x + y}. Then A } and A 2 have no points in common and A } nA 2 = 0* 

Example 10- For every set A, AnA^A and A n 0 - 0, 

Example It Let 


< x < I 


尧 = 1 ， 2,3, • 


















A i U A z A } OA^ 


FIGURE 1,1 

T 

1 

Then A x n A 2 r\ ' is the null set, since there is no point that belongs to 
each of the sets A u A 2 , >4 3 ,, • , • 


Example 12* Let A l and A 2 represent the sets of points enclosed, respect¬ 
ively, by two intersecting circles. Then the sets A { u A 2 and A { n A 2 are 
represented, respectively, by the shaded regions in the Venn diagrams in 
Figure IJ, 

Example 13, Let A u A 2 ^ and A 3 represent the sets of points enclosed, 
respectively, by three intersecting circles. Then the sets (A { \j A 2 )nA 3 and 
(Ai n A 2 ) u A 3 are depicted in Figure 1 上 


Definition 5. In certain discussions or considerations, the totality 
of all elements dial pertainjo the discussion can be described This set 
of all elements under consideration is given a special name. It is called 
the space We shall often denote spaces by capital script letters such 
< 潘 ， and 


as 


‘ ’ ! ； ■友广招次,襄争、 ' , ‘ • . 

Example M. Let the number of heads, in tossing a coin four times, be 

denoted by x. Of necessity, the number of heads will be one of the numbers 
0, 1, 2, 3, 4; Here, then, the space is the set W = {0, 1, 2, 3, 4}_ 



FIGURE 1.2 
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Example 15, Consider all noml^generaCe rectangles of base x and height 
y. To be meaningful, both x and y must be positive* Thus the space is the set 
^ ^{(x,y):x>0,y>0}. 

Definition 6. Let ^ denote a space and let 4 be a subset of the set 
^ The set that consists of all elements of that are not elements of 
^ is called the complement of A (actu ally，with respect to The 
complement of A is denoted by A*] In partlcularT = 0. 

Example 16. Let be defined as in Example 14, and let the set ^4 {0, I}. 

The complement of A (with respect to is A* = = {2, 3, 4}, 

Example 17, Given A cz Then AuA* = AnA* ^ 0, 
i4 u ^ A and (A*)* — A. * 

In the calculus, functions such as 

J{x) = 2x, 一 oo < x < 00 , 

+ 

or 

■ 

g(x, y) = e^ x ^ y y 0 < x < oo T 0 < 少 < oo ， 

— 0 elsewhere ， 

or possibly 

A(x! ， i 2 , ■ a ) = 3x t x 2 x n ， 0<x f < 1 , 1,2, _ w, 

= 0 elsewhere, 

were of common occurrence. The value of/(x) at the “point x = 1 ，， is 
/(l) ^ 2; the value ofg(x, y) at the “point (- 1 ， 3)，，is g( — 1 ， 3) = 0; the 
value of A(X[, x l7 *. *, x n ) at the “point (1 ， 1， ，• • ， I)” is 3. Functions 
such as these are called functions of a point or， more simply, point 
functions because they are evaluated (if they haveTvafu^ aTa point 
in a space of indicated dimension. 

There is no reason why，if they prove useful, we should not have 
functions that can be evaluated, not necessarily at a point, but for an 
entire set of points. Such functions are naturally called functions of a 
set or，more simply, set functions. We shall give some examples of set 
functions and evaluate them for certain simple sets. 

Example 18* Let Abes, set in one-dimensional space and let Q(A) be equal 
to the number of points in A which correspond to positive integers. Then Q{A) 
is a function of the set A. Thus ， if A - {x :0 < x < 5}, then Q(A) = 4; if 
^ — {—2, 一 1}，then Q(A) = 0; if A — {x: —co < x < 6}, then Q(A) — 5. 

Example 19. Let A be a set in two-dimensional space and let Q(A) be the 
area of if A has a finite area; otherwise, let Q(A) be undefined. Thus, if 
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^ = {(x, y) rx 2 +/< 1} T then Q(A) == n; if A ^ {(0,0), (I, 1),(0, 1)}, then 
Q(A) = 0; if ^ = {(^r, y):0<x,0<y,x + y< I}, then Q(A) ^ 5 . 

Example 20. Let 4 be a set in three-dimensional space and let Q{A) be 
the volume of A, if A has a finite volume; otherwise，let Q(A) be undefined* 
Thus, if J = {(X ， y 9 z):Q < x <2 y 0 < y < 1 T 0 < 2 < 3} ? then Q(A) = 6 ; if 
A = {(jc, z) : jc 2 + y 2 + z 2 > I } 5 then Q(A) is undefined. 

At this point we introduce the following notations. The symbol 

p 

J{x、dx 

will mean the ordinary (Riemann) integral of j(x) over a prescribed 
one-dimensional set A; the symbol 

； ;g(x f y)dxdy 

. ■ 山 J 奋會 

will mean the Riemann integral of g(x^ y) over a prescribed 
two-dimensional set A; and so on. To be sure，unless these sets A and 
these functions f(x) and g(x, y) are chosen with care，the integrals 
will frequently fail to exist- Similarly, the symbol 

E/(x) 

A 

will mean the sum extended over all xe A; the symbol 

H, Z Z g(^ y) 

A 

will mean the sum extended over all (x, y) e A; and $0 on. 

Example 2L Let ^ be a set in one-dimensional space and let 

Q(A) = E D, where 

A 

fix) = ({r, ： c=I ， 2,3 , …， 

— 0 elsewhere. 

If /I = {x : 0 < jc < 3}, then 

Q(^) ^ 2 + G) 2 + (i) 3 = I* 

Example 22. Let Q(A) = ^ f{x\ where 

A 

_ /?)■' x = o ， r, 

1 

— 0 elsewhere. 

:- , 

Q(A ) - i i-p ； 


If A = {0}, then 
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i^A=^{x:l<x< 2}, then Q(A) = / (!) — p. 

Example 23. Let A be a one-dimensional set and let 


Q(A) 


Thus, if A {x:0 < x < oo}, then 


a 


e~ x dx. 


% 


Q(A) 


e~ x dx = 1; 




A — {x : l < x <2}, then 

Q(A) 


f*2 


€， x dx = - e ^ 2 ； 


if ^ {x : 0 < x < 1} and A 2 — {x: 1 < x < 3} f theti 


Q(Ai u A 2 ) 


e~ x dx 


e^ x dx + 


^0 


e^ x dx 


if A = Aj u A 2 y where A 


Q(A t ) + Q(A 2 ); - 

{jc : 0 < jc < 2} and A 2 = {x : I < x < 3}, then 


2 ⑷ = Q(At uA 2 ) 


e^ x dx 


^0 


广 2 


e~ x dx + 

4o * 

e_ x dx — 

1 」 


e— x dx 


— Q(^\) + 0( 為 ）— Q(^i n ^2)* 
資 Example 24* Let J be a set in 71-dimensional space and let 


r 


Q(A) 






mf 


dx x 私 … dx, 




1( A = {(x u x 2i ^ ^x n ):0 ^ x l < x 2 < ^ — <x n < 1}, then 


Q(A) 




r x n 




f*X2 


dx x dx 2 … dx n - 1 dx n 


Jq Jq Jq Jq 


nl 


where nl = n(n — 1) — - 3'2 * 1. 
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EXERCISES 

1丄 Find the union A x u A t and the intersection A } nA 2 of the two sets A t 
and A 2> where: 

(a) ^, = {0,1, 2}, A 2 ={2 r 3,4}, 

(b) = {x : 0 < jc < 2}, = {x: i < x < 3}. 

(c) =： {(x, y):0 < x <2,0 < y <2), 

A 2 = {{x, y): \ < x < ^, \ < y < 3}. 

1*2* Find the complement A* of the set A with respect to the space if: 

(a) ^ — {x: 0 < x < 1}, ,4 = {jc : I < x < I }* 

(b) j/ = {(x, y, z) : X 1 + y 2 + z 2 < 1}, A — {(x 3 y f z): x 2 + y 2 + z 2 — 

(c) = {(x, y) : |jc( + l^j <2), A — {(x, y):x 2 + y 1 < 2}. 

List all possible arrangements of the four letters m, a, r, and y. Let A x 
be the collection of the arrangements in which y is in the last position* Let 
^2 ^ the collection of the arrangements in which m is in the first position. 
Find the union and intersection of A } and A 7 . 

K By use of Venn diagrams, in which the space si is the set of points 
enclosed by a rectangle containing the circles, compare the following sets: 

(a) A l n(A 2 u Ay) and (A } r\A 2 )u(A l n 為) • 

(b) A { u (A 2 n A^) and (A r u A 2 ) n (A t uA } ), 

^ (c) (A t uA 2 )* and 

(d) (Ai n A 2 )* and Af u A%, 

1*5. If a sequence of sets A i ,A 2 ,A y ,,. , is such that A k c A k ^ u 
/: = 1, 2' 3, . “ ， the sequence is said to be a nondecreasing sequence. Give 
an example of this kind of sequence of sets, 

H If a sequence of sets A lf A 1% ... is such that A k 3 A k + U 

k — 1，2, 3,… . ， the sequence is said to be a nonincreasing sequence. Give 
an example of this kind of sequence of sets， 

1,7. If A h A 2 , Aj, ,, * are sets such that A k c= + U k — 1,2,3 .lim A k 

is defined as the union A 2 u A 3 u — Find iim A k if: 

k-*co 

(a) A k = {x: l/k < x <3 — l/k} % k = I，2, 3. 

(b) A k = {(x, y ) : \(k < x 2 + y 2 < 4 - l/k}, k = 1， 2, 3, • " • 

1*8* If A ^ A 2l ., are sets such that A k 3 A k ^ h /: = 1, 2, 3 t *, lim A k 

is defined as the intersection nA 2 nA } n — *. Find lim A k if: 

A —oO. 

(a) A k — {x: 2 — l/k < x < 2}, k — 1， 2, 3” 

(b) A k = {x: 2 < x < 2 + l/k}, k ^ I, 2, 3. 

(c) A k = {(x, : 0 < x 2 + / ^ t//c}, k = 1 ， 2, 3” … 
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L9 - For every one-dimensional set d，let Q(A) = ^ /(x), where^x) = (f} ( 臺 ) ' 

A 

x =： 0, 1 ， 2, " ■, zero elsewhere. If A { - {x:x^0, 1,2, 3} and 
A 2 = {x:x — 0, 1 ， 2,… find Q(A { ) and 0( 為 ) • 

Hint: Recall that S n = a + ar+ ^ + ar n ^ x ^ a(i - ^)/(1 - r) and 
lim S„ = a/(l — r) provided that jr[ < L 

JI-+00 

1 ， 10. For every one-dimensional set A for which the integral exists, let 
Q(A) = i A A^) dx, where J{x) = 641 - x\ 0 < x < 1, zero elsewhere; 
otherwise, let Q(A) be undefined* K A { = {x :\< x <!}, A 2 = {|}, and 
A 3 = {x:0<x< 10}, find Q(A X \ Q(A 2 \ and Q(A y ). 

LI 1. Let Q(A) = J (x 2 + y^)dxdy for every two-dimensional set A for 
which the integral exists; otherwise，let Q(A) be undefined. If 
4 = {(U) : —l<x^ I ， 一 1 <y<, I}, A 2 — {(x % y): — I < x — y < 1}, 
and A y = {(x, y) : x 2 +/<i} ? find Q{A X ), Q{A 2 \ and 0 ⑷ * 

Hint: In evaluating Q(A 2 ) f recall the definition of the double integral (or 
consider the volume under the surface z — x 1 + y 2 above the line segment 
—1 < y < I in the xj^plane). Use polar coordinates in the calculation 

of Q(A,y k V 

\ • 

1.12，Let s/ denote the set of points that are interior to, or on the boundary 
of，a square with opposite vertices at the points (0,0) and <F ， 1). Let 

Q(^) = ! A S d y ^ 

(a) If c is the set {(x, y) :0 < x <y < 1compute Q(A), 

(b) If A a is the set {(x ， y):0<x = y< 1}, compute Q{A\ 

(c) If 4 c i is the set {(x, y):0<x/2^y <, 3x/2 < 1}, compute Q(A). 

1-13. Let be the set of points interior to or on the boundary of a cube with 
edge of length L Moreover, say that the cube is in the first octant with one 
vertex at the point (0, 0, 0) and an opposite vertex at the point (1,1, I)* Let 
Q(A) ^ dx dy dz. 

(a) If A is the set {(x, y, z) : 0 < x < y < z < Icompute Q{A). 

(b) If A is the subset {(x ， y f z):0< x^y = z < Icompute Q(A). 

1.14- Let A denote the set {(x, z) : x 2 + y 2 + z 2 < 1 Evaluate 
Q(A) = jjj y/x 2 + y 2 + z 2 dx dy dz. 

Hint: Use spherical coordinates. 

« _ 

1-15. To join a certain club，a person must be either a statistician or a 
mathematician or both. Of the 25 members in this club，19 are statisticians 
and 16 are mathematicians. How many persons in the dub are both a 
statistician and a mathematician? 
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1.16. After a hard-fought football game，it was reported that，of the 11 
starting players, 8 hurt a hip, 6 hurt an arm，5 hurt a knee, 3 hurt both a 
hip and an arm，2 hurt both a hip and a knee, 1 hurt both an arm and a 
knee, and no one hurt all three. Comment on the accuracy of the report. 


1.3. The Probability Set Function 

Let % denote the set of every possible outcome of a random 
experiment; that is , 货 is the sample space. It is our purpose to define 
a set function P{Q such that if C is a subset of 贫 ， then P(Q is the 
probability that the outcome of the random experiment is an element 
of C. Henceforth it will be tacitly assumed that the structure of each 
set C is sufficieotiy simple to allow the computation. We have already 
seen that advantages accrue if we take P(Q to be that number about 
which the relative frequency f/N of the event C tends to stabilize after 
a long series of experiments* This important fact suggests some of the 
properties that we would surely want the set function P(Q to possess. 
For example, no relative frequency is ever negative; accordingly, we . 
would want P(C) to be a nonnegative set function. Again，the relative 
frequency of the whole sample space ^ is always L Thus we would want 
P{<g) = 1. Finally, if C u C 2 , C 3 , … are subsets of <£ such that no two 
of these subsets have a point in common, the relative frequency of the 
union of these sets is the sum of the relative frequencies of the sets，and 
we would want the set function P(Q to reflect this additive property. 
We now formally define a probability set function. 


Definition 7_ If P(C) is defined for a type of subset of the space 贫， 
and if 

(a) P(Q > 0, 

(b) / > (C| u . * ■) = / > (C|) -|- 户 ( 〔 2) + … ，where 

the sets C h i == 1 ， 2, 3, ■ , • ， are such that no two have a point 
in common (that is，where C, n C y ^ 0J ^ /), 

(c) W = K 

then P is called the probability set function of the outcome of the 
random experiment. For each subset C of ^ the number P(Q is called 
the probability that the outcome of the random experiment is an 
dement of the set C or the probability of the event C, or the probability 
measure of the set C 
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A probability set function tells us how the probability is dis¬ 
tributed over various subsets C of a sample space 贫 .In this sense we 
speak of a distribution of probability. 

Remark* In the definition, the phrase “a type of subset of the space 贫 ” 
refers to the fact that 尸 is a probability measure on a sigma field of subsets 
of ^ and would be explained more fully in a more advanced course. 
Nevertheless，a few observations can be made about the collection of subsets 
that are of the type* From condition (c) of the definition, we see that the space 
贫 must be in the collection. Condition (b) implies that if the setsC,, C 2 , C 3 
are in the collection, their union is also one of that type. Finally, we observe 
from the following theorems and their proofs that if the set C is in the 
collection，its complement must be one of those subsets- In particular，the nuU 
set, which is the complement of must be in the collection. 

The following theorems give us some other properties of a 
probability set function. In the statement of each of these theorems, 
P{C) is taken ， tacitly，to be a probability set function defined for a 
certain type of subset of the sample space 

* 

Theorem 1. For each C c 贫， P(Q = I - P(C*). 

Proof. We have ^ = C u C* and C n C* = 0 - Thus，from (c) and 
(b) of Definition it follows that 

I- P(Q + P(C^% 

which is the desired result. 

Theorem 2. The probability of the null set is zero; that is ， P(0) = 0, 

^ a p 

Proof. In Theorem I， take C = 0 so that C* — Accordingly，we 
have 

I 

P{0) = I ^ p(<^) =1 — 1 = 0 ， 
and the theorem is proved. 

t ■ 

Theorem 3, If and C 2 are subsets of^ such that C { c ： C 2? then 
P{C } ) < P(C 2 ). 

■- 

Proof. Now C 2 = C, u (Cf n C 2 ) and Cj n (C| n C 2 ) — 0* Hence, 
from (b) of Definition 1, 

P(C 2 ) = P(Q) + P(C| n C 2 ), . 
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However, from (a) of Definition 7, P(Cf n C 2 ) > 0; accordingly, 
八 C 2 ) > 尸 (Q). 、 

Theorem 4, For each C cz 0 < P(C) < 1, 

a. ► * . • ■ 

Proof. Since 0 cz C a we have by Theorem 3 that 

h 

P(0)< P(Q< P{^) or 0< P(Q < 1, 
the desired result. 


Theorem 5* If C x and C 2 are subsets of then 

P(Q u C 2 ) = P(Q) + P(C 2 ) - P(C { n C 2 ). 

Proof. Each of the sets C, u C 2 and C 2 can be represented, 
respectively, as a union of nonintersecting sets as follows: / 

Ci u C 2 = Ci u (Cf n C 2 ) and C 2 = (C, n C 2 )u(Ct n C 2 ). 

^ —— 

，睡_ ■ … _ _ . 一 -- i — 

Thus, from (b) of Definition 7, 

PiC } u C 2 ) = P(C t ) + P(Cf n C 2 ) 
and 

P(C 2 ) - P(C { n C 2 ) + P(Ct n C 2 ). 

#■ i 

If the second of these equations is solved for P{C\ nC 2 ) and this result 

substituted in the first equation, we obtain 

* " * 

P(C { u C 2 ) - P(C,) + P(C 2 ) - P(C { n C 2 ). 

This completes the proof. 

* 

Example /* Let W denote the sample space of Example 2 of S^tion 1 丄 
Let the probability set function assign a probability of ^ to each of the 36 
points in If C, - {(U), (2,1), (3,1), (4, !),(5, !)} and C 2 -{(1,2), (2, 2), 
(X 2)1 then P(C t )=^ P(C 2 ) = ^ P(C { u C 2 ) = and P(Q n C 2 ) - 0. 

Example 2* Two coins are to be tossed and the outcome is the ordered 
pair (face on the first coin, face on the second coin). Thus the sample 
space may be represented as — {(H, H), (H, T), (T, H), (T, T)}. Let the 
probability set function assign a probability of ^ to each element of Let 
{(H, H), (H, T)} and C 2 = {(H, H), (T, H)}+ Then P(C { ) = P(C 2 )^l 
P(C l n C 2 ) — ^ and, in accordance with Theorem 5, P(Ci u C 2 )= 
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put tuA 


找 clu “ 冰 以似啦不相警 

... denote subsets 


Let 爹 denote a sample space and let C,, C 2 , 
of 贫 .If these subsets are such that no tv/6 have an element in 
common，they are called mutually disjoint sets and the corresponding 
events Cj ， C 2 , （ T 3 ，.… are said to be mutually exclusive events. Then ， 
for example, P(C t u C 2 u C 3 u - -) ^P(q) + …， 

in accordance with (b) of Definition 7. Moreover, if 贫 = 
C| u C 2 u C 3 vj * * *, the mutually exclusive events are further 
characterized as being exhaustive and the probability of their union is 
obviously equal to I, 

Let ^ be partitioned into k mutually disjoint subsets C } , C 2 , … ， C k 
in such a way that the union of these k mutually disjoint subsets is 
the sample space Thus the events C t , C 2 , … ， C k are mutually 
exclusive and exhaustive. Suppose that the random experiment is 
of such a character that it is reasonable to assume that each of 
the mutually exclusive and exhaustive events C h i = 1, 2 ,... 
has the same probability* It is necessary, then, that P(C y ) = l/k n 
i — K 2,… ，众 ； and we often say that the events C U C\ 、 … ， C k are 
equally likely. Let the event E be the union of r of these mutually 
exclusive events, say 


£" = C, u C 2 u — u C r , r < k. 


Then 


P(E)- P(Q) + P(C 2 ) 


+ P(c) 


k 


Frequently, the integer k is called the total number of ways (for this 
particular partition of <0) in which the random experiment can 
terminate and the integer r is called the number of ways that are 
favorable to the event E. So, in this terminology, P{E) is equal to the 
number of ways favorable to the event E divided by the total number 
of ways in which the experiment can terminate* It should be 

b 

emphasized that in order to assign, in this manner^ the probability rjk 
to the event E, wemust assume that each of the mutually exclusive and 
exhaustive events C* ， C 2 ，…， has the same probability \jk 、This 
assumption of equally likely events then becomes a part of our 
probability model- Obviously, if this assumption is not realistic in an 
application, the probability of the event E cannot be computed in this 
way. 

We next present an example that is illustrative of this model. 
Example 3* Let a card be drawn at random from an ordinary deck of 
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52 playing cards. The sample space ^ is the union of k — 52 outcomes, and it is 
reasonable to assume that each of these outcomes has the same probability 
去， Accordingly, if E { is the set of outcomes that are spades, P{E X ) = 玆 =! 
because there are r, = 13 spades in the deck; that is, | is the probability of 
drawing a card that is a spade. If E 2 is the set of outcomes that are kings, 
P(E 2 ) = because there are r 2 — 4 kings in the deck; that is, ^ is the 

probability of drawing a card that is a king. These computations are very easy 
because there are no difficulties in the determination of the appropriate values 
of r and k. However, instead of drawing only one card, suppose that five cards 
are taken, at random and without replacement, from this deck. We can think 
of each five-card hand as being an outcome in a sample space* It is reasonable 
to assume that each of these outcomes has the same probability. Now 
if E } is the set of outcomes in which each card of the hand is a spade, 
尸 (£| ) is equal to the number r, of all spade hands divided by the total number, 
say of five-card hands. It is shown in many books on algebra that 



In general, if/iisa positive integer and ifxisa non negative integer with x <n^ 
then the binomial coefficient \ 


n 


{n — x)\ 


is equal to the number of combinations of n things taken xat a time. If = 0, 
0! = 1, so that (o)= 1. Thus, in the special case involving 


P{E^ 



(13)(12X11)(10)(9) 


⑺ 


(52)(51)(50)(49)(48) 


0.0005, 


approximately. Next, let E 2 be the set of outcomes in which at least one card 
is a spade. Then E* is the set of outcomes in which no card is a spade. There 

are r\ = (f) such outeomes. Hence 


準 f) 


⑺ 


/ 52 \ 


and P(E 2 ) 
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Now suppose that is the set of outcomes in which exactly three cards are 
kings and exactly two cards are queens. We can select the three kings in 



any one of L. ways and the two queens in any one of ways. By a well- 



known counting principle, the number of outcomes in E 3 is r 3 


Thus 




mi&r 


m 


Finally, let E 4 be the set of outcomes in which 


there are exactly two kings，two queens，and one jack. Then 


準 4) 


_ 

⑺ 


because the numerator of this fraction is the number of outcomes in E 4 , 

Example 3 and the previous discussion allow us to see one way in 
which we can define a probability set function, that is，a set function 
that satisfies the requirements of Definition 7. Suppose that our space 
贫 consists of k distinct points, which, for this discussion，we take to 
be in a one-dimensional space. If the random experiment that ends in 
one of those k points is such that it is reasonable to assume that these 
points are equally likely, we could assign 1/A: to each point and let, for 

a ， ^ 

number of points in C 


P{Q 


k 


£ A x \ where fix) 


c 


k y 




For illustration, in the cast of a die, we could take 
^ — {1 ， 2, 3, 4, 5, 6} and J{x) — ^ if we believe the die to be 

unbiased Clearly, such a set function satisfies Definition 7, 

The word unbiased in this illustration suggests the possibility that 
all six points might not, in all such cases，be equally likely. As a matter 
of fact ，loaded dice do exist* In the case of a loaded die, some numbers 
occur more frequently than others in a sequence of casts of that die. 
For example, suppose that a die has been loaded so that the relative 
frequencies of the numbers in 贫 seem to stabilize proportional to the 
number of spots that are on the up side. Thus we might assign 
J{x) — jc/21, jc e 贫 ， and the corresponding 

HQ = I Jix) 
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would satisfy Definition 7. For illustration, this means that if C — 
{1 ， 2, 3}，then 1 


i Kx) = ^ 



Whether this probability set function is realistic can only be checked 
by performing the random experiment a large number of times. 


EXERCISES 

■i, 

1.17* A positive integer from one to six is to be chosen by casting a die. Thus 
the elements c of the sample space ^ are 1,2,3,4, 5,6. Let = {I, 2, 3, 4}, 
C 2 — (3,4, 5, 6}, If the probability set function P assigns a probability of 
i to each of the elements of compute P(C t ), P(C 2 ), P(C y n C 7 ), and 

P(c { u c 2 y 、 . 

1,18. A random experiment consists of drawing a card from an ordinary deck 
of 52 playing cards. Let the probability set function P assign a probability 
of ^ to each of the 52 possible outcomes. Let C L denote the collection of 
the 13 hearts and let C 2 denote the collection of the 4 kings. Compute P(C l ), 

P(C 2 \ P(C f n C 2 ) r and F(C r u C 2 ). 

u , _ 

: 1-19. A coin is to be tossed as many times as necessary to turn up one head. 
Thus the elements c of the sample space ^ are H, TH, TTH, TTTH, and 
so forth. Let the probability set function P assign to these elements the 
respective probabilities and so forth. Show that P{ € €) = 1 * Let 

Calcic is H, TH, TTH, TTTH，or TTTTH}. Compute P(Ql Let 
Ci — {c:c is TTTTH or TTTTTH}, Compute ?{C 2 ), n £7 2 )，and 

pIq u c 2 y 

* ■ * ^ * 

1*20, If the sample space is ^ = C, u C 2 and if P(C t ) = 0.8 and P(C 2 ) = 0,5, 
find PiC.n^). 

:+ 

1.21* Let the sample space be ^ = {c : 0 < r < oo}* Let C c ： ^ be defined by 
C = {c :4 < c < qo } and take P(C) — J c e x dx. Evaluate 尸 (C), and 
P(CuC^). f - 

1*22. If the sample space is^ — {c : — oo < £* < oo } and if C a ^ is a set for 
which the integral | r dx exists, show that this set function is not a 
probability set function* What constant do we multiply the integral by to 
make it a probability set function? 

1.23, If C, and C 2 are subsets of the sample space show that 

■ ^ r * 

P(Q n C 2 ) < PiC,) < PiC, u C 2 ) < P(Q)+ P(C 2 l 
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1.24* Let C], C 2 , and C 3 be three mutually disjoint subsets of the sample space 
^ Find FKQ u C 2 ) n C 3 ] and P{C^ u CU 

1*25. If C|, C 2i and C 3 are subsets of 贫 ， show that 

P(Q u C 2 u C 3 )= 尸 (C,) + 他 ) + P(C、) — P{C, n C 2 ) 

. — P(Q n C 3 ) - P(C 2 nC 3 ) + P(Q nC 2 nC 3 ). 

〆 * - 

What is the generalization of this result to four or more subsets of 
Hint: Write P(C l u C 2 u C 3 ) = P[Cj u (C 2 u C 5 )] and use Theorem 5, 

Remark* In order to solve a number of exercises, like 1.26-1.31, certain 
reasonable assumptions must be made. 

1.26. A bowl contains 16 chips, of which 6 are red > 7 are white, and 3 are blue. 
If four chips are taken at random and without replacement, find the 
probability that: (a) each of the 4 chips is re<J; (b) none of the 4 chips is red; 
(c) there is at least 1 chip of each color, 

1.27. A person has purchased 10 of 1000 tickets sold in a certain raffle. To 
determine the five prize winners, 5 tickets are to be drawn at random and 
without replacement. Compute the probability that this person will win at 
least one prize. 

Hint: First compute the probability that the person does not win a prize. 

1.28. Compute the probability of being dealt at random and without 
replacement a 13-card bridge hand consisting of: (a) 6 spades, 4 hearts, 2 
diamonds, and 1 club; (b) !3 cards of the same suit, 

1.29. Three distinct integers are chosen at random from the first 20 positive 
integers. Compute the probability that: (a) their sum is even; (b) their 

product is even* 

• ■ 

1-30. There are 5 red chips and 3 blue chips in a bowl. The red chips are 
numbered 1 ， 2, 3, 4, 5, respectively, and the blue chips are numbered 1, 2, 
3 ， respectively. If 2 chips are to be drawn at random and without 
replacement, find the probability that these chips have either the same 
number or the same color. 

I3h In a lot of 50 light bulbs, there are 2 bad bulbs. An inspector examines 
5 bulbs, which are selected at random and without replacement. 

(a) Find the probability of at least 1 defective bulb among the 5, 

(b) How many bulbs should he examine so that the probability of finding 
at least 1 bad bulb exceeds \ ? 

■I k 

■ ■急.蠡 

1.4 Conditional Probability and Independence 

In some ran^o^^ex^eriments, we are interested only in those 
outcomes that are elements of a subset C f of the sample space This 
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means，for our purposes, that the sample space is effectively the subset 
We are now confronted with the problem of defining a probability 
set function with C, as the “new” sample space. 

Let the probability set function P(C) be defined on the sample space 
贫 and let Ct be a subset of^such that P(C } ) > 0* We agree to consider 
only those outcomes of the random experiment that are elements of C ( ; 
in essence, then, we take C r to be a sample spade. Let C 2 be another 
subset of How，relative to the new sample space C ! , do we want to 
define the probability of the event C 2 ? Once defined, this probability 
is called the conditional probability of the event C 2 ，relative to the 
hypothesis of the event C x ; or, more briefly, the conditional probability 
of C 2 ,giveo Such a conditional probability is denoted by the symbol 
P(C 2 \C { ). We now return to the question that was raised about the 
definition of this symbol Since C, is now the sample space，the only 
elements of C 2 that concern us are those, if any, that are also elements 
ofC t? that is，the elements of C t n C 2 . It seems desirable ， then, to define 
the symbol P(C 2 \C { ) in such a way that 

j 

PiQlC,) = l and PiCjlQ) - P(C { n . 

Moreover, from a relative frequency point of view, it would seem 
logically inconsistent if we did not require that the ratio of the 
probabilities of the events C| n C 2 and C f , relative to the space be 
the same as the ratio of the probabilities of these events relative to the 
space that is, we should have 


PjC.nC^) P(C t nC 2 ) 
^PiQlc,) 一 P{C } ) 


These three desirable conditions imply that the relation 


P(C 2 \C X ) = 


P(C { n C 2 ) 

P(Q) 


is a suitable definition of the conditional probability of the event C 2 , 
given the event C u provided that P(C } ) > 0. Moreover, we have 

1 . P^lQ) > 0 . 

2. P(C 2 u C 3 w * \C { ) - P{C 2 \C X ) + P(C 3 \C } ) + provided that 

C 3 ”，, are mutually disjoint sets. 

3 . L 
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Properties (1) and (3) are evident; proof of property (2) is left as an 
exercise (1,32), But these are precisely the conditions that a probability 
set function must satisfy. Accordingly, PfQIC,) is a probability set 
function, defined for subsets of C|, It may be called the conditional 
probability set function, relative to the hypothesis C_; or the 
conditional probability set function, given C { . It should be noted that 
this conditional probability set function, given C,, is defined at this time 
only when P{C { ) > 0, 

Example h A hand of 5 cards is to be dealt at random without 
replacement from an ordinary deck of 52 playing cards. The conditional 
probability of an all-spade hand (C 2 )，relative to the hypothesis that there are 
at least 4 spades in the hand (Cj), is, since CinC 2 = C 2 , 


P(Qlc t )= 


P(C 2 ) 

piQ) 



From the definition of the conditional probability set function，we 
observe that 


P(C l nC 2 ) = P(C l )P(C 2 \C i l 

This relation is frequently called the multiplication rule for proba¬ 
bilities* Sometimes, after considering the nature of the random 
experiment, it is possible to make reasonable assumptions so that both 
JP(C!) and P(C 2 \C\) can be assigned* Then P(C l n C 2 ) can be computed 
under these assumptions. This will be illustrated in Examples 2 and 3. 

Example 2* A bowl contains eight chips. Three of the chips are red and 
the remaining five are blue. Two chips are to be drawn successively, at random 
and without replacement- We want to compute the probability that the first 
draw results in a red chip (C } ) and that the second draw results in a blue chip 
(C 2 y It is reasonable to assign the following probabilities: 

P(Q) = l and 

Thus, under these assignments，we have F(C t n C 2 ) = (|)( 争） =M. 

Example 3, From an ordinary deck of playing cards, cards are to be 
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drawn successively, at random and without replacement. The probability that 
the third spade appears on the sixth draw is computed as follows. Let C, be 
the event of two spades in the first five draws and let C 2 be the event of a spade 
on the sixth draw. Thus the probability that we wish to compute is F(C t n C 2 ), 
It is reasonable to take 



The desired probability P(C } nC 2 ) is then the product of these two numbers. 


The multiplication rule can be extended to three or more events* In 
the case of three events, we have, by using the multiplication rule for 
two events, 

7 / 

P(Q n C 2 n Cj) = P[(Q n C 2 ) n C y ] . 

But P(C t n C 2 ) - P(C { )P(C 2 \C t ). Hence 

PiQn^nCj)^ 耶⑽ ㈣⑽⑽】 n C 2 )_ 

A This procedure can be used to extend the multiplication rule to four 
or more events. The general formula for k events can be proved by 
mathematical induction. 

Example 4. Four cards are to be dealt successively, at random and with¬ 
out replacement，from an ordinary deck of playing cards. The probability 
of receiving a spade, a heart, a diamond, and a club, in that order, is 

This follows from the extension of the multiplication rule. In 

this computation, the assumptions that are involved seem dear. 

► + • _ 

Let the space ^ be partitioned into k mutually exclusive and 
exhaustive events C 2 , …， C k such that P{C^) > 0, r = 1 ， 2, ， •, ， A:. 
Here the events C\ ， C 2 , … ， C k do not need to be equally likely. Let C 
be another event such that P(C) > 0. Thus C occurs with one and only 
one of the events C-, C 2 , …， C k \ that is ， 

C = C n (Cj u C 2 u - u C k ) 

V i. '■' 

=(Cn Cj) u (Cn C 2 ) u ^ * u (Cn C k ). 

Since CnC h i = 1,2,,,. ,A:, are mutually exclusive, we have 
P(C) = P(C n C s ) + P(Cr\ C 2 ) + " * + P{C n C^)* 
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However, P(C n CJ = i = 1 ， 2,… ，灸 ； so 

i 

P(Q - P(C } )P(C\Q) + P(C 2 )P(C\C 2 ) + … + P(C k )P(C\C k ) 

=i PicmciQy 

i- i 


This result is sometimes called the law of total probability. 

From the definition of conditional probability, we have，using the 
law of total probability, that 


P(Cj\Q 


P(CnCj) P(Q)P(C\Cj) 


P(C) 


k 


I Pic^Picicd 


which is the well-known Bayes 9 theorem. This permits us to calculate 
the conditional probability of C j9 given C from the probabilities of 
C U C 2 , ^ and the conditional probabilities of C given C h 
f = 1 ，2， •■… •，灸 《 

Example 5. Say it is known that bowl Q contains 3 red and 7 blue chips 
and bow] C 2 contains 8 red and 2 blue chips. All chips are identical in size and 
shape. A die is cast and bowl C } is selected if five or six spots show on the side 
that is up; otherwise, bowl C 2 is selected. In a notation that is fairly obvious, 
it seems reasonable to assign P(Cj )=1 and P(C 2 ) — The selected bowl is 
handed to another person and one chip is taken at random. Say that this chip 
is red, an event which we denote by C By considering the contents of the 
bowls, it is reasonabie to assign the conditional probabilities P(C\ C { ) = ^ and 
P(CIC 2 ) — Thus the conditional probability of bowl Cj, given that a red 
chip is drawn, is 


P(Q\C)^ 


P(C X )P(C\C X ) 

PiC^PiClQ) + P(C 2 )P(C\C 2 ) 


— ㈣ ） _ 3 
0(矗)+ (!)(蚤） 

In a similar manner, we have jP(C 2 |C) — 

In Example 5, the probabilities P(Ci) — I and P(C 2 ) = I are called 
prior probabilities of C { and C 2 , respectively, because they are known 
to be due to the random mechanism used to select the bowls. After the 
chip is taken and observed to be red, the conditional probabilities 
P(C { \C) = and P(C 2 \C) = are called posterior probabilities. Since 
C 2 has a larger proportion of red chips than does C|,it appeals to one’s 
intuition that P(C 2 \C) should be larger than P(C 2 ) and, of course, 
P(C t \C) should be smaller than P(C r ). That is, intuitively the 
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chances of having bowl C 2 are better once that a red chip is observed 
than before a chip is taken. Bayes" theorem provides a method of 
determining exactly what those probabilities are. 

Example 6, Three plants, C,, C 2l and C 3 , produce respectively, 10,50, and 
40 percent of a company’s output. Although plant C, is a small plant, its 
manager believes in high quality and only 1 percent of its products are 
defective. The other two, C 2 and C 3t are worse and produce items that are 3 
and 4 percent defective, respectively. AH products are sent to a central 
warehouse. One item is selected at random and observed to be defective, say 
event C. The conditional probability that it comes from plant Cj is found as 
follows. It is natural to assign the respective prior probabilities of getting an 
item from the plants as P(C t ) = 0J S P(C 2 ) = 0,5, and / >(C 3 ) - 0,4, while the 
conditional probabilities of defective are PiQQ) = 0,01，= 0.03, and 
尸 (CjO = 0,04. Thus the posterior probability of C r , given a defective, is 

/>(C,|C) = ^ C|nC) =_ 犯 _. 01 ) _ 

P(Q (0 10)(0.01) + (0.50)(0.03) + (0.40)(0,04) ’ 

which equals 忐 ; this is much smBllpr than th6 prior probability P(^C \) = 
This is as it should be because the fact that the item is defective decreases 
the chances that it comes from the high-quality plant C 卜 

Sometimes it happens that the occurrence of event C { does not 
change the probability of event C 2 ; that is, when P(C } ) > 0 5 

L ■ P{C 2 \Q) - P(C 2 ). 

In this case, we say that the events C| and C 2 are independent Moreover, 
the multiplication rule becomes 


P(C X nC 2 ) = PiC^PiC^Q) - 
This，in turn, implies，when P(C 2 ) > 0, that 


P(Q\C 2 ) 


P(Ct n C 3 ) P{C X )P(C 2 ) 


P{C 2 ) 


P(C 2 ) 


P(Q). 


Remark* Events that are independent are sometimes called statistically 
independent, stochastically independent, or independent in a probability sense. 
In most instances，we use independent without a modifier if there is no 

possibility of misunderstanding, 

■ 


It is interesting to note that C, and C 2 are independent if P(C { ) — 0 
or P(C 2 ) ^ 0 because then F(C, n C 2 ) ^ 0 since (C, n C 2 ) c C\ and 
(C t n C 2 ) c= C 2 - Thus the left- and right-hand members of 

PiQ n C 2 ) = P(C,)P(C 2 ) 
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are both equal to zero and are，of course, equal to each other Also, 
if C r and C 2 are independent events, so are the three pairs: C x and C*, 
Cf and C 2 , and C| and Cf (see Exercise 1-41). 

Example 7, A red die and a white die are cast in such a way that the 
number of spots on the two sides that are up are independent events. If C, 
represents a four on the red die and C 2 represents a three on the white die, 
with an equally likely assumption for each side, we assign P(Q) — | and 
AQ) = Thus, from independence, the probability of the ordered pair 
(red = 4, white = 3) is 

m, 3)i = (ixi) - ^ 

The probability that the sum of the iip spots of the two dice equals seven is 

JP[(h 6 ), ( 2 , 5 ), ( 3 , 4 ), ( 4 , 3 ), ( 5 , 2 ), ( 6 , 1 )] 

=a)a )+ ⑽ +axi) + ⑽ + o(z) + (lx?) 

In a similar manner, it is easy to show that the probabilities of the sums of 
2, 3, 4, 5, 6, 7, 8, 9, !0, 11, 12 are ， respectively^ 

■4 

丄 ■丄 

36 ^ 36^ 36 » 36 » 36 > 36 ^ 36 ^ M* 36 ' 

Suppose now that we have three events ， C 、， C 2 , and C 3 . We say 
that they are mutually independent if and only if they are pairwise 
independent: 

P(Q n C 3 ) = PiC^PiQl PiC, n C 2 ) - P{C X )P{C 2 \ 

P(C 2 n C 3 ) = P(C 2 )P(Q) 
and 

P{C' nC 2 nC 3 ) = P{C x )P{C 2 )P{C,y 

More generally, the n events C' ， C 2 , _ ， C n are mutually independent 

if and only if for every collection of k of these events, 2 ^ A: < n, the 
following is true: 

Say that H ，…， d k are fc distinct integers from 1 ， 2,… ， #i; then 

P(C A n Q 。 … o 户 (C,,)i>(c 々 ) … P(c dk y 

缭 - r ^ 

■* t 4 - » 

In particular, if Ci ， C 2 , … ， C n are mutually independent, then 

P(Q n C 2 n • • • n CJ = P(Q)P(C 2 ) - - - P(C n ). 
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Also, as with two sets, many combinations of these events and their 
complements are independent, such as 

Cf and (C 2 u C? u C 4 ) are independent; 

C t u Cf ? and C 4 n Cf are mutually independent. 

If there is no possibility of misunderstanding, independent is often used 
without the modifier mutually when considering more than two events. 

We often perform a sequence of random experiments in such a way 
that the events associated with cme of them are independent of the 
events associated with the others* For convenience, we refer to these 
events as independent experiments, meaning that the respective events 
are independent. Thus we often refer to independent flips of a coin or 
independent casts of a die or — more generally — independent trials of 
some given random experiment. 

Examples. A coin is flipped independently several times* Let the event C i 
represent a head (H) on the ith toss; thus C* represents a tail (T)* Assume that 
C, and CJ are equally likely; that is ， P(C,) - P(C^) = Thus the probability 
of an ordered sequence like HHTH is, from independence, 

P(Q nC 2 nCfnC 4 ) - P(Q)P(C 2 )P(n)P(C 4 ) - (|) 4 = ^ 

Similarly, the probability of observing the first head on the third flip is 

; P(C1 n Cf n Q) = P(a)P(d)P(C } ) = ({y = ^ 

H 

Also, the probability of getting at least one head on four flips is 

P(Q uC 2 uC 3 uC 4 )= 1 - P[(C } u C 2 u C 3 u C 4 ) % ] 

= 】一 P(C1[ n Cf n Cf n Cf) 

=1 — f|\4 _ 15 

See Exercise 1,43 to justify this last probability, 

EXERCISES 

1-32. If P(C t ) > 0 and if C 2 , C 3 , C 4 , ■ • ■ are mutually disjoint sets，show that 
尸 (C 2 uC 3 u … O PiQlQ) + PiQlQ) + …. 

1.33. Prove that 

P(C t nC 2 nC 3 n C 4 ) - PiC^P^lC^PiQlQn C 2 )P(C ： 4 |C, n C 2 nC 3 ). 

L34, A bowl contains 8 chips. Three of the chips are red and 5 are blue. Four 
chips are to be drawn successively at random and without replacement, 
(a) Compute the probability that the colors alternate. 
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(b) Compute the probability that the first blue chip appears on the third 
draw. 

1,35. A hand of 13 cards is to be dealt at random and without replacement 
from an ordinary deck of playing cards, Find the conditional probability 
that there are at least three kings in the hand relative to the hypothesis that 
the hand contains at least two kings, 

1.36* A drawer contains eight pairs of socks. If six socks are taken at random 
and without replacement, compute the probability that there is at least one 
matching pair among these six socks, 

Hint: Compute the probability that there is not a matching pair. 

L37* A bow! contains 10 chips. Four of the chips are red, 5 are white, and 

1 is blue. If 3 chips are taken at random and without replacement, compute 
the conditional probability that there is I chip of each color relative to the 
hypothesis that there is exactly I red chip among the 3, 

1.38* Bowl I contains 3 red chips and 7 blue chips. Bowl II contains 6 red chips 
and 4 blue chips. A bowl is selected at random and then 1 chip is drawn 
from this bowl 

(a) Compute the probability that this chip is red. 

(b) Relative to the hypothesis that the chip is red, find the conditional 
probability that it is drawn from bowl IL 

1.39. Bowl I contains 6 red chips and 4 blue chips. Five of these J 0 chips are 
selected at random and without replacement and put in bowl II, which was 
originally empty. One chip is then drawn at random from bowl IL Relative 
to the hypothesis that this chip is blue, find the conditional probability that 

2 red chips and 3 blue chips are transferred from bowl I to bowl IL 

L40- A professor of statistics has two boxes of computer disks: box C"! 
contains seven Verbatim disks and three Control Data disks and box C 2 
contains two Verbatim disks and eight Control Data disks. She selects a box 
at random with probabilities P(C t ) — | and P(C 2 ) = | because of their 
respective locations. A disk is then selected at random and the event C 
occurs if it is from Control Data. Using an equally likely assumption for 
each disk in the selected box, compute and P(C 2 \C). 

1.4U If Cj and C 2 are independent events, show that the following pairs of 
events are also independent; (a) C { and Cf, (b) Ct and C 2 , and (c) Cf and 
Cf. 

招 /i"In(a) ， writeP(C,nC?) = P(C l )P(Cf|C l > = PiC^ll - P(C 2 \C,)l 
From independence of C, and C 2 , P(C 2 \Ci) = P(C 2 ). 

L42. Let C t and C 2 be independent events with P(C } ) = 0.6 and P(C 2 ) = 0.3. 
Compute (a) P(C\ n C 2 )； (b) P(C X u C 2 ); (c) P(C\u Cf), 
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1-43* Generalize Exercise I A to obtain 

(C, u C 2 w * * u C k y — Cf n n … n Cf •_ 

Say that (7 】， C 2 ," • ， CT* are independent events that have respective 
probabilities p u p 2 , ——，/^ • Argue that the probability of at least one of 
C x , C 2 , …， CV is equal to 

1 — <1 — /?i)(l — 朽） … (1 一 pd 

1*44* Each of four persons fires one shot at a target* Let C k denote the event 
that the target is hit by person k, k - 1, 2, 3, 4. If C 2 , C 3 , C 4 are 
independent and if P(Q) = P(C 2 ) = 0 J, P(C 3 > = 0,9, and P{C 4 ) = 04, 
compute the probability that (a) all of them hit the target; (b) exactly one 
hits the target; (c) no one hits the target; (d) at least one hits the target. 

1-45* A bowl contains three red (R) balls and seven white (W) balls of exactly 
the same size and shape. Select balls successively at random and with 
replacement so that the events of white on the first trial, white on the second, 
and so on, can be assumed to be independent* In four trials, make certain 
assumptions and compute the probabilities of the following ordered 
sequences: (a) WWRW; (b) RWWW; (c) WWWR; and (d) WRWW, 
Compute the probability of exactly one red ball in the four trials. 

1.46. A coin is tossed two independent times, each resulting in a tail (T) or 
a head (H). The sample space consists of four ordered pairs: TT, TH, HT, 
HH, Making certain assumptions, compute the probability of each of these 
ordered pairs. What is the probability of at least one head? 

1*5 Random Variables of the Discrete Type 

The reader will perceive that a sample space ^ may be tedious to 
describe if the elements of ^ are not numbers. We shall now discuss 
how we may formulate a rule, or a set of rules, by which the elements 
c of ^ may be represented by numbers. We begin the discussion with 
a very simple example. Let the random experiment be the toss of a coin 
and let the sample space associated with the experiment be 
^ = fc : where f is T or c is H} and T and H represent, respectively, 
tails and heads. Let 火 be a function such that X{c) = 0 if c is T and 
let X(c) — I if r is H* Thus X is a real-valued function defined on the 
sample space ^ which takes us from the sample space ^ to a space of 
real numbers s/ = {0, \ }. We call X a random variable and, in this 
example, the space associated with A"is ^ = {0, IWe now formulate 
the definition of a random variable and its space* 

Definition 8* Consider a random experiment with a sample 
space 贫 .A function X, which assigns to each dement re^ one and 
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only one real number X(c) = is called a random variable. The space 
of X is the set of real numbers == {x; x = X(c), ce^}. 

It may be that the set ^ has elements which are themselves 
real numbers* In such an instance we could write X{c) = c so that 

Let T be a random variable that is defined on a sample space 質， 
and let si bp the space of X. Further, let ^4 be a subset of Just as 
we used the terminology “the event C,” with C c: 贫 ， we shall now 
speak of “the event ^The probability P(C) of the event C has been 
defined. We wish now to define the probability of the event A. This 
probability will be denoted by Pr (X € A\ where Pr is an abbreviation 
for “the probability that,” With A a subset of s^ 7 let C be that subset 
of 爹 such that C = {c: and X(c) e A}. Thus C has as its elements 

all outcomes in ^ for which the random variable X has a value that 
is in i This prompts us to define, as we now do, Pr (X e A) to be equal 
to P(C )， where C = {c: c and X(c) e A} t Thus Pr (Xe A) is an 
assignment of probability to a set A y which is a subset of the space 
associated with the random variable X This assignment is determined 
by the probability set function P and the random variable X and is 
sometimes denoted by P X (A). That is, 

Vt(X^A)^P x {A)^P{Q, 

丨 

where C ^ {c:ce^ and X{c) eA}. Thus a random variable X is r 
function that carries the probability from a sample space ^ to a space 
^ of real numbers. In this sense, with A c= the probability P X (A) 
is often called an induced probability. 

* » ^ I 

* # * ■ 

Remark* In a more advanced course, it would be noted that the random 
variable X is a Borel measurable function. This is needed to assure that we 
can find the induced probabilities on the sigma field of the subsets of , We 
need this requirement throughout this book for every function that is a 

random variable, but no further mention of it is made. 

_ * 

The function P x (^) satisfies the conditions (a), (b), and (c) of the 
definition of a probability set function (Section 13), That is, P X {A) is 
also a probability set function. Conditions (a) and (c) are easily verified 
by observing, for an appropriate C, that 

Px(A) - P(Q > 0 ? 

and that ^ = {c: ce^ and X{c) e requires 

px^^y == Pi^0) — i* ^ + 
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In discussing the condition (b)，let us restrict our attention to the two 
mutually exclusive events A x and A 2 . Here P x (A l u A 2 ) - P(Q, where 
C = {c: ce^ and X(c) e A } u A 2 }. However, 

C = {c: and X(c) e A } } u {c : c e ^ and X(c) 6 A 2 }, 

or, for brevity ，C — C* u C 2 - But C t and C 2 are disjoint sets. This must 
be so, for if some e were common, say c h then X(c/) e A } and X(Ci) e A 2 . 
That is, the same number AXq) belongs to both A x and A 2 . This is a 
contradiction because A t and A 2 are disjoint sets. Accordingly, 

尸 (O = PiQ) + P(C 2 ). 

■ * 

However, by definition, P(Ci) is P X (A { ) and P(C 2 ) is /M 為 ） and thus 

^x{^ i ^ ^ 2 ) ~ 心 ( 為 ） + Pid). 

This is condition (b) for two disjoint sets* 

Thus each of P X {A) and P(Q is a probability set function. But the 
reader should fully recognize that the probability set function P is 
defined for subsets C of ^ whereas P x is defined for subsets A of 
and，in general，they are not the same set function. Nevertheless，they 
are closely related and some authors even drop the index X and write 
P(A) for P X (A). They think it is quite clear that P(A) means the 
probability of A, a subset of and P(C) means the probability of C, 
a subset of 贫 ， From this point on, we shall adopt this convention and 
simply write P(A). 

Perhaps an additional example will be helpful Let a coin be 
tossed two independent times and let our interest be in the number 
of heads to be observed. Thus the sample space is ^ = (c : where c is 
TT or TH or HT or HH}. Let X(c) = 0 if c is TT; let X(c) - 1 if c 
is either TH or HT; and let X{c) — 2 if c is HH. Thus the space of 
the random variable Z is — {0, 1 ， 2}. Consider the subset A of the 
space where A — {!}* How is the probability of the event A 
defined? We take the subset C of ^ to have as its elements all 
outcomes in ^ for which the random variable X has a value that is an 
element of A, Because X(c) — 1 if c is either TH or HT，then 
C^{c: where ^ is TH or HT}, Thus P(A) - Pr {XbA) = P(Q. Since 
^ — {1}, then P(A) = Pr (X e A) can be written more simply as 
Pr (X = 1). Let C t = {c:cis TT}, C 2 ^{c:ck TH}, C 3 ^{c:cis HT}, 
and C 4 = {c : c is HH} denote subsets of 貧 . From independence and 
equally likely assumptions (see Exercise J.46), our probability set 








Sec* L5| Random Variables af the Discrete Type 


31 


Pr(X = x)- 


simply as 


Example L Consider a sequence of independent flips of a coin，each 
resulting in a head (H) or a tail (T), Moreover, on each flip，we assume that 
H and T are equally likely, that is, P(H) = ^(T) — The sample space 贫 
consists of sequences like TTHTHHT • " • Let the random variable X equal 
the number of flips needed to obtain the first head. For this given sequence， 
X — 3. Clearly, the space of Xis = {1， 2, 3, 4, • ■ *}. We see that X ― 1 when 
the sequence begins with an H and thus Pr (X — I) = Likewise, X —2 when 
the sequence begins with TH，which has probability Pr (X =2) = ( 5 )(^) — \ 
from the independence. More generally, if A" = x, where x — 1，2, 3, 4, . ， ， ， 
there must be a string of jc — 1 tails followed by a head，that is，TT … TH, 
where there are x — I tails in TT * " T- Thus, from independence, we have 


Pr (X = x) 








,2, 3, 


Let us make some observations about these three illustrations of 
a random variable. In each case the number of points in the space 
was finite，as with { 0 , 1 } and { 0 , 1 , 2 }, or countable, as with 
{1， 2, 3— *, }• There was a function, say J{x} = Pr (X — jc), that 
described how the probability is distributed over the space si. In each 


function P(Q assigns a probability of { to each of the sets C h 
/ = 1, 2, 3, 4. Then P(Q) = P(C 2 u C 3 ) = J | - I, and P(C A ) = i. 
Let us now point out how much simpler it is to couch these statements 
in a language that involves the random variable X. Because X is the 
number of heads to be observed in tossing a coin two times, we have 

Pr (X = 0) = since P(C { ) - 

Pr (X — I) — ^ since P(C 2 u C 3 )= 


2 5 


and 


Pr( J r=2)-i 5 since P(C 4 ) = i. 

This may be further condensed in the following table: 


X 

0 I 2 

Pr (X - x) 

1 ] 1 

4 2 4 


This table depicts the distribution of probability over the elements 
of the space of the random variable X. This can be written more 


G 

X 


2 


\~ / 
2 ' X 
/ - \ 
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of these illustrations, there is a simple formula (although that is not 
necessary in generai) for that function, namely: 


A^) 




2 


xe{0, 1 }， 


and 



^{ 0 , 1 , 2 } 


jr 




2 r 


xe {1 ， 2, 3, … 


Moreover, the sum ofJ{x) over all equals 1: 


I 

x = 0 


2r2 + 2 =l 



Finally ， if A a we can compute the probability of Xe A by the 
summation 


A 


For illustrations, using the random variable of Example 1 ， 


Pr(X= 1 ， 2,3) 


X 


and 



X 


2 十 4+8 一 8 


Pr (X = 1 ， 3, 5, • •,）= 



+ 



3 


+ 



5 


+ 


■ * * 


2 . 2 
= -=— ' 

1-1 3. 

4 

We have special names for this type of random variable X and for a 

function f{x) like that in each of these three illustrations, which we 
now give. 

Let X denote a random variable with a one-dimensional space 
Suppose that ^ consists of a countable number of points; that is, ^ 
contains a finite number of points or the points ofW can be put into 
a one-to-one correspondence with the positive integers. Such a space 
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^ is called a discrete set of points. Let a function j{x) be such that 
A^) > 0, xe and 

I t Ax)=h 

Whenever a probability set function P{A\ j c j/，can be expressed 
in terms of such an^jc) by 

P(A) = Pr(X€A) = l i J{x), 

A 

then X is called a random variable of the discrete type andy(^) is called 
the probability density function of X. Hereafter the probability density 
function is abbreviated p.d.f. 

Our notation can be simplified somewhat so that we do not need 
to spell out the space in each instance. For illustration, let the random 
variable be the number of flips necessary to obtain the first head. We 
now extend the definition of the p.d.f. from on = {I ， 2, 3, - … } to 
all the real numbers by writing 

只 x)=( 乏 ): x ~ 1, 2, 3,, 


— 0 elsewhere. 


From such a function, we see that the space ^ is clearly the set of 
positive integers which is a discrete set of points. Thus the 
corresponding random variable is one of the discrete type. 


Example 2. A lot, consisting of 100 fuses, is inspected by the following 
procedure. Five of these fuses are chosen at random and tested; if all 5 “blow” 
at the correct amperage, the lot is accepted. If, in fact, there are 20 defective 


fuses in the 
assumptions, 


lot, the probability of accepting the lot is, under appropriate 



approximately* More generally, let the random variable X be the number of 
defective fuses among the 5 that are inspected. The p.d.f of X is given by 


A^) = Pr (X = x)= 



x — 0 f I ， 2, 3, 4, 5, 


= 0 elsewhere. 
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Clearly, the space of X is ¥ = [0, 1 ， 2, 3, 4, 5}. Thus this is an example of 
a random variable of the discrete type whose distribution is an illustration of 
a hypergeometric distribution. 

Let the random variable X have the probability set function P{A\ 
where A is 2 i one-dimensional set. Take jc to be a real number 
and consider the set A which is an unbounded set from — oo to x, 
including the point jc itself For all such sets A we have 
P{A) = Pr (Xe A) — Pr (X < x). This probability depends on the 
point x; that is, this probability is a function of the point x. This point 
function is denoted by the symbol — Pr (X < x). The function 
F(x) is called the distribution function (sometimes ， cumulative 
distribution function) of the random variable X. Since 
F(x) = Pt (X< x), then, with f(x) the p.d.f” we have 

L /( 扣)， 

W ^ X 

for the discrete type- 

Example 3. Let the random variable X of the discrete type have the p.d.f. 
f{x) — x/6, x = 1, 2, 3, zero elsewhere. The distribution function of X is 

F{x) — 0 , jc < 1 ， 

=i ， 1 <x < 2, 

^ 2 < x < 3, 

—3 ^ x* 

Here, as depicted in Figure 13, F(x) is a step function that is constant in every 
interval not containing 1 ， 2, or 3, but has steps of heights |， |，and which 
are the probabilities at those respective points. It is also seen that F(x) is 
everywhere continuous from the right. The p.d.f. of X is displayed as a bar 

Fix) 


* 


2 

FIGURE 1.3 
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fix) 



6 


6 

i 

T — * 

-- 1 — --- X 

1 2 3 

FIGURE 14 

graph in Figure 1A We see that f(x) represents the probability at each x 
while F(x) cumulates all the probability of points that are less than or equal 
to x* Thus we can compute a probability like 

Pr (L5 < Z < 4.5) - F(45) — F(1 ， 5) - 1 -i-f 

or as 

Pr (1.5 < X S 4.5) = /(2) +/(3) - 1 + I = 

While the properties of a distribution function F{x) — Ft (X < x)are 
discussed in more detail in Section 1.7, we can make a few observations 
now since F(x) is a probability. 

K 0 < F(x) < 1. 

2, F(x) is a nondecreasing function as it cumulates probability as x 
increases. 

3. F(y) = 0 for every point y that is less than the smallest value in the 
space of X, 

《 F(z) — 1 for every point z that is greater than the largest value in 
the space of X. …. 

5. If X is a random variable of the discrete type, then F(x) is a step 
function and the height of the step at x in the space of X is equal 
to the probability f(x) ^Pv(X - x). 


EXERCISES 

1*47. Let a card be selected from an ordinary deck of playing cards. The 
outcome c is one of these 52 cards. Let X(c) = 4 if c is an ace, let X(c) = 3 
if c is a king, let X(c) — 2 if c is a queen, let X(c) = 1 if c is a jack, and 
let X(c) = 0 otherwise. Suppose that P assigns a probability of 士 to 
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each outcome c. Describe the induced probability Px(A) on the space 
si = {0, 1, 2, 3, 4} of the random variable X. 

1*48. For each of the following, find the constant c so that^(x) satisfies the 
f condition of being a p.d-f. of one random variable X. 

(a) J{x) = c(|) x , x = I ， 2, 3, _ _ zero elsewhere. 

(b) j{x) — cx, x = 1 ， 2, 3, 4, 5, 6, zero elsewhere, 

L49. Let J{x) — x/15, x = 1 ， 2, 3,4, 5, zero elsewhere，be the p.d 丄 of A". 
Find Pr (X ^ I or 2), Pr (| < X < |), and Pr (l < X < 2), 

1.50. Let f(x) be the p,df. of a random variable X. Find the distribution 
function F(x) of X and sketch its graph along with that of j{x) if: 

(a) J{x) = 1 5 x — 0, zero elsewhere. 

(b) J{x) = ^ x — 一 1 ， 0, 1， zero elsewhere. 

(c) J(x) = x/15, x — 1 ， 3,4, 5, zero elsewhere. 

.» 

1-51, Let us select five cards at random and without replacement from an 
ordinary deck of playing cards. 

(a) Find the p.d.f, of the number of hearts in the five cards. 

(b) Determine Pr (X < 1). 

1*52，Let X equal the number of heads in four independent flips of a coin. 
Using certain assumptions, determine the p.d.f of X and compute the 
probability that X is equal to an odd number. 

L53 - Let X have the pAS. J{x) — x/5050 ，x — 1 、 2, 3, • 100， zero 

™v'. 

elsewhere. 

(a) Compute Pr (X < 50). 

(b) Show that the distribution function of X is F(x) = [x)([x] + 1)/10100, 

for I < < 100, where [x] is the greatest integer in x. 

1*54, Let a bowl contain 10 chips of the same size and shape. One and only 
one of these chips is red Continue to draw chips from the bowl，one at a 
time and at random and without replacement, until the red chip is drawn. 

(a) Find the p.d-f. of the number of trials needed to draw the red chip, 

(b) Compute Pr (X < 4). 

1*55* Cast a die a number of independent times until a six appears on the up 
side of the die. 

(a) Find the p.dS.J{x) of X, the number of casts needed to obtain that 
first six, 

(b) Show that [ f[x) — L 

X = I 

(c) Determine Pr (X — 1, 3, 5, 7,… 

(d) Find the distribution function F(x) = Pr (X <, x). 
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1-56* Cast a die two independent times and let X aqua] the absolute value of 
the difference of the two resulting values (the numbers on the up sides). Find 
the p.dX of X. 

Hint: It is not necessary to find a formula for the p.dX 


1*6 Random Variables of the Continuous Type 

A random variable was defined in Section 1.5, and only those of the 
discrete type were considered there. Let us begin the discussion of 
random variables of the continuous type with an example. 

Let a random experiment be a selection of a point that is interior 
to a circle of radius 1 that has center at the origin of a two-dimensional 
space. We call this space ^ and the area of this circle is n. The random 
selection is in such a way that the probability of being in a certain set 
C interior to ^ is proportional to the area of C; in particular, ifC c % 


P(Q 


area of C 


71 


First we observe that P^0) = L In addition, if C x is that subset of 
^ that is in the first quadrant, ^(Ci) — ^ If C 2 is the interior 

of a circle of radius | such that C 2 <= then P(C 2 ) = n(^ 2 jn = It is 
interesting to note that the probability of a point，a line segment^ or 
any curve in ^ is equal to zero because those areas would be zero. In 
particular, if C 3 is the boundary of the set C 2 (that is, C 3 is the actual 
circle of radius |) s then P(C 3 ) = 0* 

We define a random variable X, associated with this random 
experiment, as the distance of the selected point from the origin- The 
space of X is ^ = {x : 0 < x < 1}, Of course, for any 
Pr (X ~ x) = 0, because X — x is the event that the random point falls 
on a circle，symmetric with respect to the origin, of radius jc and the 
associated area equals zero. However, it does make sense to consider 
the induced probability of the event X < x, namely the distribution 
function of X, If jc e then 


F(x) = Pr (X^x) 


area of a certain circle of radius x 


n 


nx 2 


% 


x 2 , 0 < x < L 
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Clearly, if jc < 0, then F(x) = 0; and if jx: > I, then F(x) — 1 - Thus we 
can write 

F(x) — 0, x < 0, 

=x 2 , 0 < x < 1, 

* l ? l ^ Xm 

Recall, in the discrete case, we had a function / that was associated 
with F through the equation 

m= i 伽 ) • 

W SX 

Either F or f could be used to compute probabilities like 

Pv(a<X<b)^ F(h) ~ F(a) = I 

wg A 


where A — {w: a <w < b ]. We have observed, in this continuous case, 
that Pr (X = x) — 0, so a summation of such probabilities is no longer 
appropriate. However, it is easy to find an integral that relates F to f 
through 


F(x) 




J{w) dw, 




Since ^ = {x :0 < x < 1}, this can be written as 

詉 1 } 


F(x) = x 1 


dw. 


xe ^ 


0 


By one form of the fundamental theorem of calculus, we know that the 
derivative of the right-hand member of this equation is J{x). Thus 
taking derivatives of each member of the equation, we obtain 

2x —J{x% 0 < x < I 


Of course, at x = 0, this is only a right-hand derivative. We observe 
that J{x) > 0, x e i，and 


2x dx = L 




Probabilities can now be computed through 


Pr (XeA) 




J{w) dw. 






i 
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For illustration ， 


PrQ<X<i) 


rU2 


2w dw = 卜 2 ] 


1/4 


1/2 

1/4 


^ - H 4 = 

With the background of this example, we give the definition of a 
random variable of the continuous type. 

Let A" denote a random variable with a one-dimensional space j/, 
which consists of an interval or a union of intervals. Let a function f(x) 
be nonnegative such that 


f(x)dx=\ 




Whenever a probability set function P{A\ A ^ ^ can be expressed 
in terms of such an f(x) by 


P(A) = Pr (XeA) 




/(Jt) dx ， 




then X is said to be a random variable of the continuous type and f{x) 
is called the probability density func tion (pA.l) of X. 

Example /* Let the random variable of the continuous type X equal the 

distance in feet between bad records of a used computer tape. Say that the 

space of is ^ = {x:0 < x < co}. Suppose that a reasonable probability 
model for X is given by the p,d.t 

f(x) = ^e- x/40 y 
Here/(jc) > 0 for x g and 


40 


e^ xl4Q dx 




o 


If we are interested in the probability that the distance between bad records 
is greater than 40 feet, than /4 = {x:40<x<oo} and 


Pr (XeA) 


40 


e 


jf/40 


dx 


e 


^40 


The p.d_r and the probability of interest are depicted in Figure 1 . 5 , 

If we restrict ourselves to random variables of either the discrete 
type or the continuous type，we may work exclusively with the p.d.f 
/(x)，This affords an enormous simplification; but it should be 
recognized that this simplification is obtained at considerable cost from 
a mathematical point of view- Not only shall we exclude from 
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f(x) 



FIGURE 1.5 


consideration many random variables that do not have these types of 
distributions, but we shall also exclude many interesting subsets of the 
space. In this book, however, we shall in general restrict ourselves to 
these simple types of random variables. 

Remarks. Let X denote the number of spots that show when a die is cast* 
We can assume that A" is a random variable with = {1 ， 2,… ， 6} and with 
a p.d.f. = Other assumptions can be made to provide different 

mathematical models for this experiment. Experimental evidence can be used 
to help one decide which model is the more realistic- Next, let X denote the 
point at which a balanced pointer comes to rest. If the circumference is 
graduated 0<x< 1, a reasonable mathematical model for this experiment is 
to take X to be a random variable with s/ — {x:0<x<l} and with a p.d.f. 

Both types of probability density functions can be used as distri¬ 
butional models for many random variables found in rea! situations. For 
illustrations consider the following. If X is the number of automobile acci¬ 
dents during a given day ， then/^0) ， yO) ， /(2},.- . represent the probabilities 
of 0, 1,2,,.* accidents. On the other hand, if X is length of life of a female 
born in a certain community, the integral [area under the graph ofJ{x) that 
lies above the x-axis and between the vertical lines 义 = 4G and x = 50] 

广 50 

Ax、dx . 

represents the probability that she dies between 40 and 50 (or the percentage 
of those females dying between 40 and 50), A particular J{x) wiH be suggested 
later for each of these situations，but again experimental evidence must be used 
to decide whether we have realistic models. 
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Our notation can be considerably simplified when we restrict 
ourselves to random variables of the continuous or discrete types. 
Suppose that the space of a continuous type of random variable X is 
= {x:0 < x < 00 } and that the pAS. of X is e~ x f We shall 

in no manner alter the distribution of X [that is, alter any P(A), A cz 
if we extend the definition of the p.d.f. of X by writing 

f{x) = e 一 ' 0 < x < oo, 

— 0 elsewhere, 

and then refer to_/(x) as the p.dX of JT, We have 


A^) dx 


CO 




0 必 c + 


e 


dx 


00 




Thus we may treat the entire axis of reals as though it were the space 
of X. Accordingly, we now replace 

i 


K^)dx by 1 J{x) dx. 

J - a> 

is the p-d.f. of a continuous type of random variable X and 
if A is the set {jc : a < x < b], then P(A) — Pr (Xe A) can be written 
as 

f>b * 

Pr (<3 < JT < = J{x) dx. 

’I 1 

Moreover, if A {a}, then 

P(A) = Pr (Xe A) = Pr (X = a) = J{x) dx — 0, 

since the integral \aA x ) dx is defined in calculus to be zero* That is, if 
X isa random variable of the continuous type, the probability of every 
set consisting of a single point is zero. This fact enables us to write ， say ， 

Pr (a<X< b) = Pr (a <X<b). 

More important, this fact allows us to change the value of the p.df, 
of a continuous type of random variable Z at a single point without 
altering the distribution of X. For instance, the p.d.f. 

J{x) — e ~ x ， 0 < x < co y 


7 ^: 0 elsewhere, 
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can be written as 

J{x) ™ e 一 \ 0 < ^: < oo, , 

= 0 ， elsewhere, 

without changing any P(A), We observe that these two functions differ 
only at 叉 = 0 and Pr (X = 0) = 0. More generally, if two probability 
density functions of random variables of the continuous type differ 
only on a set having probability zero, the two corresponding 
probability set functions are exactly the same* Unlike the continuous 
type, the p.d+f, of a discrete type of random variable may not be 
changed at any point，since a change in such a p.d.f. alters the 
distribution of probability. 

Example 2, Let the random variable X of the continuous type have 
the p.d.f. J(x) = 2/x 3 , 1 < x < oo^ zero elsewhere. The distribution function 
of X is 

f 

f^x 

' F(x) — Odw jc < 1 ， 

GO 

2 I 

The graph of this distribution function is depicted in Figure 1.6* Here /(x) is a 
continuous function for all real numbers x; in particular, is everywhere 
continuous from the right. Moreover, the derivative of f{x) with respect to 
x exists at all points except at x = 1, Thus the p*dX of X is defined by this 
derivative except at jc — 1, Since the — {1} is a set of probability measure 
zero [that is, P(A) = 0], we are free to define the p.dX at x = I in any manner 
we please. One way to do this is to write J(x) = 2/x\ l <x < oo, zero 
elsewhere. 





FIGURE t.6 
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EXERCISES 

1,57* Let a point be selected from the sample space {c:0 <c< 10}. Let 
C 1 c 贫 and let the probability set function be P{C) — [ c jL dz. Define the 

random variable X to be X(c) = c 2 t Find the distribution function and the 
p.d_f. of JT. 

1.58. Let the probability set function P(A) of the random variable X be 

p ( A ) = L/W 办 ， where f(x) = 2xj9, ^ {x:0<x< 3}. Let 

A f = {x:0<x< 1},A 2 -={x:2<x<3}. Compute /^ 义 ） =Pr [XeA^I 
P(A 2 ) = Pr(X€A 2 ) y and P(A ( u/( 2 ) = Pr (XeA.u A 2 ); ’ 

1.59. Let the space of the random variable A" be j/ = {jc : 0 < jr < IIf 
A t = {x : 0 < x < ^} and A 2 ^ {x ; \ < x < ]}, find P{A 2 ) if 户 (4) 

1*60, Let the spaa ofthe random variable 10} and 

let where A { ^ {x: l < jc< 5}. Show that P(A 2 ) < |, where 

A 2 = {x: 5 < 10 }、 

•i •- 

1.61. Let the subsets = {x:\<x<{} and A 2 ^= {x:{<x < 1} of the 
space - {x:0 < x < 1} of the random variable A" be such that P(A i)-- 
and P(A 2 ) = i. Find P(A l uA 2 l P(Afl and P(Af n Af). S 

1-62. Given \ A + jr 2 )I dx, where A c ： = {x : -oo < x < oo}. Show 
that the integral could serve as a probability set function of a random 
variable X whose space is sf. 


-63，Let the probability set function of the random variable X be 


P{A) 




e_ x dx' where ^ = {x : 0 < x < oo}. 




Let A k = {x:2 - l/k<x<3l 女 =1 ， 2,3 . Find Hm ^ and 


k 


Z 7 ( lim ^ * 

k-^co 


Find P(A k ) and lim P(A k ). Note that lim P(A k ) = P ( lim A k 

A—QO - k -*oo V A-»oo 


1.64 For each of the following probability density functions of X, compute 
Pr (⑷ < I) and Pr (X 2 < 9). 

⑻ /U)=?/I8 ，一 3<jc <3, zero elsewhere. 

(b) f(x) — {x + 2)/18, —2 < x < 4, zero elsewhere* 

1-65, Let f(x) ― \/x 2 , l < X < oo, zero elsewhere, be the p.dX of X 
If A } ^{x:\ <x <2} and A 2 ^={x:4<x< 5}, find P(A ] uA 2 ) and 

a o. 

_ ■ 讎 

1*66* A mode of a distribution of one random variable X is a value of x that 
maximizes the p.df f{x). For X of the continuous type, f(x) must be 
continuous. If there is only one such x, it is called the mode of the 
distribution. Find the mode of each of the following distributions: 

(a) f(x) — {\y, x - 1 ， 2, 3, " • ‘ zero elsewhere* 
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(b) f{x) = 1 2x\ I — x%0 < x < I ? zero elsewhere, 
(C) A ^) -(鉍 W < jc < oo, zero elsewhere. 


L67. A median of a distribution of one random variable X of the 
discrete or continuous type is a value of jc such that Pr (X < x) < | 
and Pr (X < x)> If there is only one such x, it is called the 
median of the distribution. Find the median of each of the following 
distributions: 

4! /lV /3 V 

(a) /IX) = ^- 7 ^ - ~ x ( 7 ) [ 7 I , x — 0 3 I, 2, 3 t 4, zero elsewhere, 

(b) 朋 = 



x\ (4 ― x)\ 

3JC 2 , 0 < x < I, zero elsewhere. 


(c) Ax ) - 


n(l + x 2 ) 5 


— CO<X<OD 


Hint: In parts (b) and (c), Pr (X < x) = Pr (X < x) and thus that 
common value must equal } if jc is to be the median of the distribution, 

1.68. Let 0 < /? < L A (100/))th percentile (quantile of order p) of the 
distribution of a random variable X is a value ^ such that Pr (X < ^ p ) <p 
and Pr (X < ^ p ) > p m Find the twentieth percentile of the distribution that 
has pAS.J{x) — 4x^, 0 < x < 1, zero elsewhere* 


Hint: With a continuous-type random variable X, Pr (X < ^ p )= 
Pr (X ^ and hence that common value must equal p, 

1*69, Find the distribution function /1 (jc) associated with each of the follow¬ 
ing probability density functions. Sketch the graphs of f(x) and F(x), 

⑻ /w = 3(r—x) 2 ,0 < 文 < 1 ， zero elsewhere, 

(b) f(x) = 1/x 2 , 1 < x < 00 , zero elsewhere* 

(c) f(x) — j, 0 < x < I or 2 < x < 4, zero elsewhere. 

Also find the median and 25th percentile of each of these distributions. 


1*70- Consider the distribution function F(x) == I — e~ x — xe~ x , 0 < jc < 00 , 
zero elsewhere. Find the p,d,f” the mode，and the median (by numerical 
methods) of this distribution. 


1*7 Properties of the Distribution Function 

In Section 1.5 we defined the distribution function of a 
random variable X as F(x) — Pr (X < jc)* This concept was used 
in Section L6 to find the probability distribution of a random 
variable of the continuous type. So, in terms of the p‘cLf. f{x), we know 
that 

F{x) = X / ⑽， 

W ^ X 
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for the discrete type of random variable, and 


F ( x ) ^ /(h^) dw, 

^—00 

for the continuous type of random variable. We speak of a distribution 
function F(x) 这 s being of the continuous or discrete type, depending 
on whether the random variable is of the continuous or discrete type. 

Remark* If A" is a random variable of the continuous type, the p,d*f,/(x) 
has at most a finite number of discontinuities in every finite interval. This 
means (1) that the distribution function f\x) is everywhere continuous and (2) 
th 巧 t the derivative of F(x) with respect to x exists and is equal tof(x) at each 
point of continuity of f(xX That is, F f {x) = f{x) at each point of continuity 
off{x). If the random variable X is of the discrete type, most surely the p.d.f. 
f{x) is not the derivative of F(x) with respect to jc (that is, with respect to 
Lebesgue measure); but/(^:) is the (Radon-Nikodym) derivative of JF{x) with 
respect to a counting measure. A derivative is often called a density. 
Accordingly, we call these derivatives probability density functions. 

There are several properties of a distribution function F(x) that can 
be listed as a consequence of the properties of the probability set 
function. Some of these are the following. In listing these properties, 
we shall not restrict to be a random variable of the discrete or 
continuous type. We shall use the symbols F(oo) and oo) to mean 
lim F(x) and lim F(x) y respectively. In like manner，the symbols 

{x: x < oo} and {x :x< -oo} represent, respectively , the l imits of the 
sets {x : xs b) and {x:x< -6} as b^oo. 

1 • 0 < f(x) < 1 because 0 <Pr(X < x) < 1 . 

Z F(x) is a nondecreasing function of x For，if / < x"，then 

{x:x ^ x ff } = {x:x < x'} u {x: x' < x < x rr } 

and 

Pr (X < x tf ) - Pr (X < x f ) + Pr <X^ x f, ). 

That is ， 

F(x") — F(x f ) - Pr (x f < X < x ,r ) > 0. 

3 ， /l(oo) = 1 and F(~oo) = 0 because the set {x : x < oo) is the 
entire one-dimensional space and the set {a: : jc < - oo} is the null set. 
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From the proof of property 2, it is observed that ，if a < b, then 

¥x {a < X < b) ^ F{b) - F{a\ 

Suppose that we want to use F(x) to compute the probability 
Pr {X = b). To do this, consider, with A > 0, 


Km Pr(b — h < X < b) 

A—0 


=lim [F(b) -F{b- h)\ 

A —*0 


Intuitively, it seems that lim Pr (b — h < X < b) should exist and be 

equal to Pr (X — b) because, as h tends to zero, the limit of the set 
{x: 6 — A < ^: < A} is the set that contains the single point The 
fact that this limit is Pr (X = b) is a theorem that we accept without 
proof. Accordingly, we have 


Pr(X = b) = F(b)-F(b-X 


where F(b—) is the left-hand limit of jF(;c) at jc = b. That is, the 
probability that X — b is the height of the step that F(x) has atx = k 
Hence, if the distribution function F(x) is continuous at jc = d, then 
A P r (JT=ZO = 0 / 

There is a fourth property of F(x) that is now listed* 

4. F(x) is continuous from the right, that is, right-continuous. 
To prove this property, consider, with h > 0 5 

h ’ • ， n -J 4 

lim Pr {a <X<a + K) = lim [F(a + h) — F(a)]. 

A—^0 

, * „ - ■ * 

* 

We accept without proof a theorem which states, with A > 0, that 

iim Pr (a < X < a + A) P{0) ― 0. 

“o 

Here also, the theorem is intuitively appealing because, as h tends to 
zero, the limit of the set {x: a < jc < a + h} is the null set* Accordingly ， 
we write 


0 = + ) — F{a\ 

where F{a+) is the right-hand limit of F(x) at x = a. Hence F(x) is 
continuous from the right at every point x = a. 

Remark. In the arguments concerning several of these properties，we 
appeal to the reader’s intuition- However, most of these properties can be 
proved in formal ways using the definition of lim A k ， given in Exercises L7 
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and 1.8, and the fact that the probability set function P is countably additive; 
that is, P enjoys (b) of Definition 7, 

The preceding discussion may be summarized in the following 
manner: A distribution function F{x) is a nondecreasing function of jc, 
which is everywhere continuous from the right and has F( — ao) = 0, 
F(oo) = 1. The probability ¥x(a<X^b) is equal to the difference 
F (b) — F(a). If x is a discontinuity point of F(x )，then the probability 
Pr (X = x) is equal to the jump which the distribution function has at 
the point x. If x is a continuity point of F(x), then Pr (X= x) = 0. 

Remark. The definition of the distribution function makes it clear that the 
probability set function P determines the distribution function F. It is true ， 
although not so obvious, that a probability set function P can be found from 
a distribution function F, That is, P and i 7 give the same information about 
the distribution of probability, and which function is used is a matter of 
convenience* 

Often，probability models can be constructed that make reason- 

■ ■ r* K 

able assumptions about the probability set function and thus the 
distribution function. For a simple illustration, consider an experiment 
in which one chooses at random a point from the closed interval [a, b], 
a < 6, that is on the real line. Thus the sample space ^ is [a, b]. Let the 
random variable X be the identity function defined on ^ Thus the 
space j/ of A" is j/ = I Suppose that it is reasonable to assume ， from 
the nature of the experiment, that if an interval ^ is a subset of ^, the 
probability of the event ^ is proportional to the length of i Hence, 
if A is the interval [a, x], x < b, then 

/ >( j) = Pr (XeA)^Pr (a < X < x) = c(x -a), 

where c is the constant of proportionality, 

In the expression above，if we take x = we have 

1 = Pr (a < X <b) = c(b — a). 


so c = \j{b — a). Thus we will have an appropriate probability model 
if we take the distribution function of X, F(x) = Pr (X < x\ to be 

F(x) 0, x < a. 




a< x < b. 



b < x. 
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i. 



FIGURE 1J 


Accordingly, the p.dX of f{x) — F{x), may be written 

f(x) = t-^—, a < x < b 、 

O "— Q 

v =0 elsewhere. 

The derivative of F(x) does not exist at x = a nor atx = b; but the set 
{x: x — a, b} is a set of probability measure zero, and we elect to define 
f(x) to be equal to I j{b — a) at those two points，just as a matter of 
convenience. We observe that this p*d*f*isa constant on If the pAS. 
of one or more variables of the continuous type or of the discrete type 
is a constant on the space j/ ，we say that the probability is distributed 
uniformly over 止 Thus, in the example above, we say that X has a 
uniform distribution over the interval [a, b]. 

We now give an illustrative example of a distribution that is neither 
of the discrete nor continuous type. 

Example L Let a distribution function be given by 

F{x) ― 0, x < 0, 

=, 0 < x < I, 

■■a 

=1 ， 1 < x. 

Then, for instance ， 

Pr( — 士） = 巧|)一代一 3) = f — 0=J 

I 嘗 ¥ ' * • w M 

and 

. Pr (jr - 0) = f(0) - F(0-) = I - 0 = f 

The graph of F\x) is shown in Figure 1.7, We see that f\;c) is not always 
continuous, nor is it a step function. Accordingly，the corresponding 
distribution is neither of the continuous type nor of the discrete type. It may 
be described as a mixture of those types. 
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Distributions that are mixtures of the continuous and discrete 
types do, in fact，occur frequently in practice. For illustration, in life 
testing, suppose we know that the length of life, say X, exceeds the 
number b f but the exact value is unknown. This is called censoring. For 
instance, this can happen when a subject in a cancer study simply 
disappears; the investigator knows that the subject has lived a certain 
number of months, but the exact length of life is unknown. Or it might 
happen when an investigator does not have enough time in an 
investigation to observe the moments of deaths of all the animals，say 
rats，in some study. Censoring can also occur in the insurance industry; 
in particular, consider a loss with a limited-pay policy m which the top 
amount is exceeded but it is not known by how much. 

Example Z Reinsurance companies are concerned with large losses 
because they might agree, for illustration，to cover losses due to wind damages 
that are between S2»000,000 and $10^000,000, Say that X equals the size of a 

wind loss in millions of dollars, and suppose that it has the distribution 
function 

F(x) = 0 ， 一 oo < x < 0 ， 

= l ~(wT^) J °^^ <00 * 

If losses beyond $10,000,000 are reported only as 10, then the distribution 
function of this censored distribution is 

F(x) = 0, — oo < x <0, 

=1 一 


10 


3 


10 


x 


0 < x < I0 5 


— 1, 10 < x < oo, 

which has a jump of [10/(10 + \0)f - | at x = 10. 

, 


We shall now point out an important fact about a function of a 
random variable. Let X denote a random variable with space j/. 


Consider the function Y = u{X) of the random variable X. Since X is 
a function defined on a sample space 贫 ， then Y = u(X) is a composite 
function defined on 9", That is, Y = u{X) is itself a random variable 
which has its own space ^ = \y\y = w(x), x e j/} and its own 
probability set function. If j e 涿 ， the event Y ^ u(X) < y occurs when, 
and only when, the event Xe A <=z occurs, where A — {x: u(x) < y}. 
That is, the distribution function of Y is 


G{y) = Pt(Y <y) = Pr [u(X) < - P(A), 
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The following example illustrates a method of finding the distribution 
function and the p.dX of a function of a random variable. This method 
is called the distribution-function technique• 


Example 3. Let f(x) — j f — 1 < jc < I, zero elsewhere, be the p.d,f. of 
the random variable X E)efine the random variable Y by Y = X 2 , We 
wish to find the p.dX of YAf y > 0, the probability Pr (Y ^ y) is equivalent 
to 

Accordingly, the distribution function of 7, G(y) = Pr (7 < v), is given 
by ^ 

G(y) - 0 S ^<0, 


^y/y 


-J~y 


2 


. _. 毋 . 

dx = y/y, 0 <>^ < 1, 


= 1， . I <y* 

Since Y is a random variable of the continuous type, the p.d.f. of Y is 
g(y) ― G f {y) at all points of continuity of g(y). Thus we may write 


g(y) 


2V~y 

0 elsewhere 


0<y<h 


Remarks. Many authors use f x and f Y to denote the respective probability 
density functions of the random variables X and Y. Here we use f and g 
because we can avoid the use of subscripts. However, at other times，we will 
use subscripts as in f x and f r or even f x and f 2% depending upon the 
circumstances. In a given example, we do not use the same symbol, without 
subscripts, to represent different functions. That is, in Example 2, we do not 
use f{x) and f(y) to represent different probabiiity density functions. 

In addition, while we ordinarily use the letter x in the description of 
the pAI. of X, this is not necessary at all because it is unimportant which 
letter we use in describing a function. For illustration, in Example 3, we 
could say that the random variable Fhasthep.d,f 《 (M/) = 1/2^/w, 0 < w < I, 
zero elsewhere, and it would have exactly the same meaning as Y has the 

< y < 1, zero elsewhere. 

These remarks apply to other functions too, such as distribution functions* 
In Example 3, we could have written the distribution function of where 
0 < vv < ], as ■ 

-, I ■! ™ ― 

Fy(w) ^ Pt (Y < w) — ^/w. 
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EXERCISES 


1*71. Given the distribution function 

H x ) - o, x< — i. 


x+2 

4 


—1 ■< 1 , 


1 


Sketch the graph of and then compute: (a) Pr (b) 

Pr(X-0);(c) Pr (X=l); (d) Pr(2<^<3)- 

L72. Let J{x) = 1, 0<x<l $ zero elsewhere, be the p,dX of 尤 Find the 
distribution function and the p.d.f. of Y=^/x. 

Hint: Pr (F^) = Pr (v^<^)=Pr (X<y 2 ) % 0 < 少 < 1 • 


1.73. Let Jlx)^x/6, x= I, 2, 3, zero elsewhere, be the p,dX of X Find the 
distribution function and the pAS. of Y—X 2 . 

Hint: Note that Z is a random variable of the discrete type. 

■ ■ 

1-74 ， Let/(x) — (4 — x)/l6, —2<x<2, zero elsewhere, be the p,dX of 1 

(a) Sketch the distribution function and the p,d,f, of X on the same set of 
axes. 

(b) If Y=\X\ y compute Pr{K< 1). 

(c) If Z=^ compute Pr (Z < - 


1.75, Let X have the p.dX J{x) — 2x, 0<x< 1, zero elsewhere. Find the 
distribution function and p.d 上 of Y—X 1 . 

. . 、 . . . ■ * 

1.76, Let X have the p*d,f. — 0<x< 1 , zero elsewhere. Find the 

』 distribution function and p.df of F= —2 In X 4 . 

1*77* Explain why, with A>0, the two limits, lim Pr (6 —/r<X<A) and 
lim F{b — A ) 5 exist. 

h—*Q 

Hint: Note that Pr (b — h<X< b) is bounded below by zero and F{b — h) 
is bounded above by both F{h) and L 

1 * * ■ • ’ 碡 : 

1-78* Let / be the distribution function of the random variable X. If m is 
a number such that F(m) — \, show that m is a median of the distribution. 

誶 》 ■拳 .■ • ai * % B 

* ■彎 

1*79* Let J{x)-\ 7 — 1 <x< 2 , zero elsewhere, be the p.dX of X. Find the 
distribution function and the p.dX of Y^X 2 . 

Hint: Consider Pr for two cases: 0 <y< I and I <7 < 4 , 
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1.8 Expectation of a Random Variable 

Let ATbe a random variable having a p,df/(jc) such that we have 
certain absolute convergence; namely, in the discrete case 

Z \ x lf( x ) converges to a finite limit, 

X 

or, in the continuous case, 

r*oo 

\x\ f(x) dx converges to a finite limit. 

* —OD 

The expectation of a random variable is 

五 W = [ x f(x), in the discrete case, 


or 


E(X) 


i*00 


x f(^) dx, in the continuous case. 


# 

f/ZorTe ^xpeaedt^u CaHed the m —a! expectation 

0f expectation or expected value has its 
chiS fr e I^ ，SCan ^ iIlustrated as follows: Four small similar 

^ 2 , r __，in a bowl ⑽ d are mixed, 
of and 1S l ° draW a chi P from the draws one 

==d 2 ㈣ 1 ， * WM 】 r ^ve one doilaMfshe draws the chip 

the Dkvlr two dollars. It seems reasonabie to assume that 

clai^(I)(T) + 2；' -" xZ^sWTt ： l Claim，， ° nthe$2 - Her * lt0taI 

the player's claim in this glme expectation of Zis precisely 

givefb/^e (able' ^ ^ the discrete type have the p.d.f. 


X 

12 3 4 

fix) 

_i _L 3 2 

jo io To To 


Here 凡 x) = 0 if 文 is not equal to 
illustrates the fact that there is no 
We have 


one of the first four positive integers. This 
need to have a formula to describe a pAS. 


E(J °" ⑴❿ + 2 Q + 3 ❿ + 4 ⑤ = 
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Example 2. Let X have the p.d.f. 


— 4X 3 , 0 < jc < 

= 0 elsewhere. 


Then 


r 1 


E(X) 


^( 4 ^) dx 






4 j ^ dx 




4〆 


」0 


4 


Let us consider a function of a random variable X with space 
说 ‘ Call this function Y = u{X). For convenience, let be of 
the continuous type and y = u(x) be a continuous increasing function 
of ^ with an inverse function x — wiy), which，of course，is 
also increasing* So Fisa random variable and its distribution function 
is 

G(y) ^Pr(r<j)-Pr [u(X) <y]^Pr[X< w(y)] 


My) 


A^) dx. 


■oo 


whereis the p,d.f, of X By one form of the fundamental theorem 
of calculus, 

giy) ^ G'iy) = A w (y)W(y)^ y e 


0 elsewhere. 


where 


^ — {y:y = u(xX xe s/} 


By definition，given absolute convergence, the expected value of Y is 


E(Y) 


yg{y) dy. 


Since y = u(x), we might ask how E( Y) compares to the integral 


poo 


^)K^) dx. 


To answer this, change the variable of integration through y — u(x) or, 
equivalently, x — w(y). Since 


dx 

Ty 




w\y) > 0, 
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we have 


yAHy)W(y) dy 


yg{y) ^ 


That is, in this special case ， 


产 00 


E(Y) 


yg(y) dy 


w( 义 )/(x) dx. 


However, this is true more generally and it also makes no difference 
whether X is of the discrete or continuous type and Y — u{X) need not 
be an increasing function of X (Exercise 1.87 illustrates this). 

So if F ^ u{X) has an expectation, we can find it from 

E[u{X)] = u{x)f(x) dx, (1) 

in the continuous case, and 

E[u(X)} - X u(x)f(x) y (2) 

x 

in the discrete case. Accordingly，we say that £[w(JO] is the expectation 
(mathematical expectation or expected value) of u{X). 

Remark* If the mathematical expectation of Y exists，recall that the 
integral (or sum) 

疒厂 — 

I \y\g{y) dy or ( 少） 

J_oq L y 一 

■i 

exists. Hence the existence of E[u(X)} implies that the corresponding integral 
(or sum) converges absolutely, 

B 

Next, we shall point out some fairly obvious but useful facts about 
expectations when they exist, 

I* If fc is a constant, then E(k) = k. This follows from expression (I) 
[or (2)] upon setting u = k and recalling that an integral (or sum) 
of a constant times a function is the constant times the integral (or 
sum) of the function. Of course, the integral (or sum) of the function 
/is 1. 

2- If 女 is a constant and u is a function，then E{kv) = kE(v), This follows 
from expression (1) [or (2)] upon setting u-kv and rewriting 
expression (1) [or (2)J as k times the integral (or sum) of the product 

vf ^ • 

3 - If k x and k 2 are constants and v { and v 2 are functions, then 

E(k [ v l + k 2 v 2 ) — + k 2 E(v 2 )* This ， too，follows from ex- 
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pression (l)[or(2)] upon setting u = k x v x + k 2 v 2 because the integral 
(or sum) of ( 灸❿ + ^ 2 ^ 2 )/ equal to the integral (or sum) of k\V { f 
plus the integral (or sum) of k 2 v 2 f. Repeated application 
of this property shows that if & ， &”••, k m are constants and 
A ， A，. - w are functions, then 

E{k^v } + k 2 v 2 + •… + k m v m ) = 4 - k 2 E(v 2 ) + . … + k m E{v m ), 

This property of expectation leads us to characterize the symbol E 
as a linear operator* 


Example 3. Let X have the p*d.f. 


f(x) — 2(1 — x), 0 < x < 1, 

= 0 elsewhere. 


Then 




xf(x) dx 


pi 


(x)2(l - jc) dx 


^0 


E{X 2 ) 




^f{x) dx 


(^)2( 1 — x) = g , 




and, of course, 


卵 Jf +3 沪卜抑 + 3 H 


Example 4. Let X have the p.d.f. 


Then 


/W 


x 

6 * 


x = I ， 2, 3, 


0 elsewhere* 


3 


x 


E{r) = i xY(x) = 


‘ 一 ！‘ 16 I SI — 抑 

一 5 十丁 + 了一 T . 

% t. 

Example 5, Let us divide, at random，a horizontal line segment of length 
5 into two parts. If X is the length of the left-hand part, it is reasonable to 
assume that X has the p.d.f, 

彻 = 5, 0 < jc < 5， 

= 0 elsewhere. 

The expected value of the length X is E{X) = | and the expected value of the 
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length 5 — X is £(5 — X) But the expected value of the product of the 
two lengths is equal to 

-5 * 

E[X(5 - X)] = x(5 - x)(\) 办: =f / ( 曼 )1 

4) 


That is, in general，the expected value of a product is not equal to the product 
of the expected values. 

Example 6. A bowl contains five chips，which cannot be distinguished by 
a sense of touch alone. Three of the chips are marked $1 each and the 
remaining two are marked $4 each. A player is blindfolded and draws, at 
random and without replacement, two chips from the bowl. The player is paid 
an amount equal to the sum of the values of the two chips that he draws and 
the game is over* If it costs $4,75 to play this game, would we care to participate 
for any protracted period of time? Because we are unable to distinguish the 
chips by sense of touch, we assume that each of the 10 pairs that can be drawn 
has the same probability of being drawn. Let the random variable X be the 
number of chips, of the two to be chosen，that are marked $1, Then, under 
our assumption s X has the hypergeometric p.d+f, 

0G-J 

- = ― —r - , X = 

e) 

= 0 elsewhere. 

If Z = x，the player receives u(x) = jc + 4(2 — x) = 8 — 3x dollars，Hence his 
mathematical expectation is equal to 

£[8 —3X]= I (8 — 3x)/(x)= 结， 

jc — 0 

or $4.40< 


EXERCISES 

1.80, Let A" have the p*dX f(x) — (x + 2)/18, — 2 < x < 4, zero elsewhere. 

Find E(X), E[(X + 2) 3 ], and E[6X - 2(X + 2)^ 

*■ 

1_81. Suppose that /(x) = \ , x = I , 2, 3,4, 5, zero elsewhere, is the p.dX of 
the discrete type of random variable X. Compute E{X) and EiX 2 }. Use 
these two results to find E[(X + 2) 2 ] by writing {X -f 2) 2 = X 2 + 4X + 4. 

1,82* Let X be 2 i number selected at random from a set of numbers 
{51， 52, 53, ‘ ■ ■ ， 100}. Approximate E( 1 jX). 
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Hint: Find reasonable upper and lower bounds by finding integrals 
bounding £(1/JO- 

1.83* Let the pAS.f(x) be positive at x — — 1 , 0,1 and zero elsewhere. 

⑻ If/ ⑼ =! ， find EiX 2 ). 

(b) If/(0) = I and if determine/ (- 1) and /(I), 

■ - ’ • ■ • 

* ■ • • 

1*84* Let X have the p.d.f. f(x) — Sx 2 ,0 < 文 < 1 ， zero elsewhere* Consider a 
random rectangle whose sides are X and (1 — X). Determine the expected 
value of the area of the rectangle， 

¥ # 

1*85, A bowl contains 10 chips, of which 8 are marked $2 each and 2 are 
marked $5 each. Let a person choose, at random and without replacement, 
3 chips from this bowl. If the person is to receive the sum of the resulting 
amounts, find his expectation. 

m 

1 . 86 * Let Xbe a random variable of the continuous type that has p*d,f./(x)- 
If m is the unique median of the distribution of X and A is a real constant, 
show that 

E(\X - b\) = E(\X - m|) + 2 (6 - x)f(x) dx f 

provided that the expretations exist For what value of h is E(\X — b\) a 
minimum? 

1*87. Let f(x) — 2x, 0 < x < t y zero elsewhere, be the p,dX of X. 

(a) Compute E( l jX), 

(b) Find the distribution function and the p.d.[ of K = 1/X 

(c) Compute E(Y) and compare this result with the answer obtained in 
part (a). 

Hint: Here = {x:0 < x < 1} S find M. 

1.88. Two distinct integers are chosen at random and without replacement 
from the first six positive integers. Compute the expected value of the 
absolute value of the difference of these two numbers, 

1.9 Some Special Expectations 

Certain expectations, if they exist, have special names and symbols 
to represent them. First, let A"be a random variable of the discrete type 
having a p.d.f ‘ f(x). Then 

'^ * * M 

E(X) = Y J x/(xy . 
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If the discrete points of the space of positive probability density are 
岣， 03 … , then 

E{X) = + + ^ 3 X(^ 3 ) + •. • • 

This sum of products is seen to be a ‘‘weighted average” of the values 
a i ， a 2 , • • ， the “weight” associated with each a f being/(^). This 

suggests that we call E{X) the arithmetic mean of the values of X, 

or，more simply, the mean value of X (or the mean value of the 
distribution). 

The mean value 只 of a random variable X is defined, when it exists, 

to be // = where Xis sl random variable of the discrete or of the 
continuous type. 

Another special expectation is obtained by taking u{X) = (X — ")气 

If，initially, JTis a random variable of the discrete type having a p.d.f 
f(x), then 

s'” a 

* 

x 

= ( a i - ") 2 /(a*) + (a 2 - fi) 2 f(a 2 ) + …， 

if a【， “ 2 , . … are the discrete points of the space of positive probability 
density. This sum of products may be interpreted as a ‘‘weighted 
average’’ of the squares of the deviations of the numbers a u a 2 ^ .. 
from the mean value /1 of those numbers where the “weight” associated 
with each (a f — p) 2 is /(a,). This mean value of the square of the 
deviation of X from its mean value is called the variance of T (or the 
variance of the distribution). 

The variance of A" will be denoted by cr 2 , and we define <j 2 , if it exists， 
by cj 2 = E[{X - fi) 2 ], whether X is a discrete or a continuous type of 
random variable* Sometimes the variance of X is written var (X). 
ft is worthwhile to observe that var (Jf) equals 

= E[(X — fi) 2 ] — E(X 2 — 2fiX + pt 2 ); 
and since £ is a linear operator, 

a 2 = EiX 2 )- 2fiE(X) + ft 2 
=E(X 2 ) _ 2〆 + M 2 
= E{X 2 ) - 

« 

This frequency affords an easier way of computing the variance of X. 
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It is customary to call a (the positive sq uare root of the variance) 
the standard deviation of X (or the standard deviation of the 
distribution). The number a is sometimes interpreted as a measure of 
the dispersion of the points of the space relative to the mean value 
fi. We note that if the space contains only one point x for which 
f(x) > 0, then cr — 0. 

Remark. Let the random variable X of the continuous type have the 
f{x) — \ j2a ， 一 a < x < a % zero elsewhere, so that a = is the 
standard deviation of the distribution of X. Next, let the random variable Y 
of the continuous type have the p,d_f. g(y) — l/4a, —2a <y <2a, zero 
elsewhere, so that a — is the standard deviation of the distribution of 

K. Here the standard deviation of Y is greater than that of X; this reflects the 
fact that the probability for Yis more widely distributed (relative to the mean 
zero) than is the probability for X. 

We next define a third special mathematical expectation, called the 


moment-generating function (abbreviated m.g.f,) of a random variable 
X. Suppose that there is a positive number h such that for —h < t < h 
the mathematical expectation E{e ,x ) exists. Thus 


ciibri 



E(e iX ) — e tx f(x) dx ， 


if Z is a continuous type of random variable, or 


E(e tX ) = X 

X 


if X is a discrete type of random variable. This expectation is called the 
moment-generating function (m.g.f.) of X (or of the distribution) and 
is denoted by M{t). That is, 

M(t) - E{e tX ). ' 

It is evident that if we set / = 0, we have A/(0) — L As will be seen by 
example, not every distribution has an m.g.f” but it is difficult to 
overemphasize the importance of an m.gX，when it does exist. This 
importance stems from the fact that the m.gX is unique and completely 
determines the distribution of the random variable; thus，if two random 
variables have the same m.gX，they have the same distribution. This 
property of an m.g.f. will be very useful in subsequent chapters. Proof 
of the uniqueness of the m.g.f* is based on the theory of transforms in 
analysis, and therefore we merely assert this uniqueness. 
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Although the fact that an m.g.f. (when it exists) completely 
determines the distribution of one random variable will not be proved, 
it does seem desirable to tty to make the assertion plausible. This can 
be done if the random variable is of the discrete type* For example, let 
it be given that 

is, for all real values of ^ the of a random variable X of the 
discrete type. If we let /(x) be the p.d.f. of X and let a, b, c, rf,... be 
the discrete points in the space of X at which f{x) > 0, then 

刪 = Z ，朋， 

jr 

or 

i- 

iV〆+A ^+ jo e3f + to e<r +/(*) 产 + … • 

Because this is an identity for all real values of t y it seems that the 
right-hand member should consist of but four terms and that each of 
the four should equal, respectively, one of those in the left-hand 
member; hence we may take a = 1 5 f{a) : =*; A = 2,f(b) = ^;c^X 
f{c) = ^; d= 4, f(d) — ^. Or, more simply, the p.dX of Xis 

/(勾=侖， Jc -1,2, 3, 4, 

= 0 elsewhere, 

" * - 

On the other hand, let A" be a random variable of the continuous 

type and let it be given that 

^ : 1 r fc _ . 

^(0 — > t < 1 , 

is the m.g.f- of X. That is, we are given 

i r°° 

「二 f — ^ rx f(x)dx 9 t < i. 

oo 

It is not at all obvious how f(x) is found. However, it is easy to see that 
a distribution with p,d.f* 

i 

f(x) — €~ x ^ .0 < X < 00 , 

p * 

= 0 elsewhere 

has the m.g.f. M(t) — (I — r)— 1 , / < L Thus the random variable X 
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has a distribution with this p.dX in accordance with the assertion of 
the uniqueness of the m.g.f. 

Since a distribution that has an m,g.f M{t) is completely deter¬ 
mined by M(t), it would not be surprising if we could obtain some 
properties of the distribution directly from M(t). For example，the 
existence of M{t) for —h <t<h implies that derivatives of all order 
exist at t ^0, Thus，using a theorem in analysis that allows us to 
change the order of differentiation and integration, we have 

= MV) - x€ tx f(x) dx, . 

^—00 

if X is of the continuous type，or 

= M%t) ^ ^ xe u f(xl 

- * t ! ^ ^ 

if X is of the discrete type. Upon setting t = 0, we have in either case 

M\0) = E{X) - pi . 

- ir " + ，电 t • 

The second derivative of M{t) is 

AT(0 = ^e^fix) dx or ^ 

^—OO X 

so that M 〃 (0) = Accordingly, the var {X) equals 

cr 2 - E{]^) -fi 2 ^ AT ⑼一 LAT(0)]l 

For example，if M{t) ~ (! — ty \ / < 1, as in the illustration above, 
then 

M\t) - (1 - t)~ 2 and AT ⑺ = 2(1 — t)~\ 

* • _ ■ 

Hence 

ft = A/’(0) = I 

i* 

and 


o 2 — ⑼一 fx 2 = 2 _ 1 = 1 . 

Of course, we could have computed ft and tr 2 from the pAS, by 




^00 


■ ■ ^ i. 

x f(x) dx and a 2 




— 00 


poo 


x^f(x) dx - pi 2 . 
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respectively. Sometimes one way is easier than the other* 

In general, if w is a positive integer and if A^ w) (0 means the mth 
derivative of M(t\ we have，by repeated differentiation with respect to 


Now 




- 五 (JT) 


A00 


Ei^) 




^Jlx) dx or [ (X )， 

JT 


and integrals (or sums) of this sort are, in mechanics, called moments. 
Since M(t) generates the values of m = 1， 2 , 3, •. •， it is called 
the moment-generating function (m.g.fj- In fact，we shall sometimes 

call £{^) the mth moment of the distribution, or the mth moment 
of X. 

Example L Let X have the p.d.f 

f(x) ^ Ux + 1), - I < X < 1 ， 


0 elsewhere, 


Then the mean value of X is 


f*CO 






xj{x) dx 


00 


X 


X 


dx = 


while the variance of X is 


a 


2 


r\ 


^/(x) dx — pi 1 = 


CO 




Example 2. If X has the p.df. 


Ax) 


JC 2 , 


< X < 00 


0 elsewhere. 


then the mean value of X does not exist, since 






\x] — rfx = lim 

X Z 6-oo 






x dx 


lim (In b — In l) 


does not exist. 
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Example 3. It is known that the series 


converges to n 2 /6. Then 

Ax) 


-t- ~ -4 - 

i 3 2 2 3 2 


# * * 


6 


W ， 


a: = 1 ， 2, 3, • 


= 0 elsewhere, 

is the p.cLf, of a discrete type of random variable X, The m ， g.f. of this 
distribution, if it exists, is given by 

M(r) — E(e ,x ) = ^ € n f(x) 




CO 


jr 


T^X 2 


The ratio test may be used to show that this series diverges ift> 0* Thus there 
does not exist a positive number h such that M{i) exists for 一 A < r < A, 

Accordingly, the distribution having the p.dX/(x) of this example does not 
have an m.g.f 

T m 

Example 4, Let X have the m.gX M{f) = e t2 气 — oo < t < oo. Wc can 
differentiate M(t) any number of times to find the moments of X. However, 
it is instructive to consider this alternative method. The function M{t) is 
represented by the following MacLaiidn's series. 

e，2/2=1+ "(l) + K0 * + ^ /，2 


1! 


k\ 





(3)(1) 

4! 


/ 4 + 


• * * 


+ 


(2* — 1) … (3)(i) 


(2k)l 


P + 


In general, the MacLaurin’s series for M(i) is 




1 


]! 

E{X) EiX 1 ) 


U 


t 


2 ! 


2 ! 


t 2 + 


ml 

E{)T) 

+ + 

ml 


Thus the coefficient of (f 1 /ml) in the MacLaurin’s series representation of M{t) 
is EiX^). So, for our particular M(t), we have 


EiX^) - (2k ^ 1)(2 灸一 3) ■ ■. (3)(1)= 
k = 1, 2 , 3, • ■. ， and ! ) = 0, A: = 1, 2, 3, • • • ■ 


(2k)l 

2 k kl 
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Remarks. In a more advanced course, we would not work with the m.g.f. 
because so many distributions do not have moment-generating functions. 
Instead, we would let i denote the imaginary unit, t an arbitrary real, and we 
would define (p{t) = E(e {tX u This expectation exists for every distribution and 
it is called the characteristic function of the distribution* To see why 炉⑺ exists 
for all real we note, in the continuous case, that its absolute value 



1^(01 — e itx f{x)dx < \e itx f(x)\ dx. 



However, [f(x)\ — f(x) since f(x) is nonnegative and 

\^ fx \ = |cos tx + i sin ix\ = ^cos 2 tx + sin 2 tx ^ U 


Thus 


\m < 


f(x) dx = I 


― 00 


Accordingly, the integral for <p(t) exists for all real values of i. In the discrete 
case，a summation would replace the integral* 

Every distribution has a unique characteristic function; and to each 
characteristic function there corresponds a unique distribution of prob- 
ability. If X has a distribution with characteristic function then, for 
instance，if E(X) and EiX 2 ) exist, they are given, respectively, by iE(X) — 
〆(()）and PEiX 2 ) — ⑼. Readers who are familiar with complex-valued 

functions may write <p(t) — M{ii) and, throughout this book, may prove 
certain theorems in complete generality. 

Those who have studied Laplace and Fourier transforms will note a 
similarity between these transforms and M{t) and it is the uniqueness of 
these transforms that allows us to assert the uniqueness of each of the 
moment-generating and characteristic functions. 


EXERCISES 


1*89* Find the mean and variance, if they exist, of each of the following 
distributions. 


⑻ /(x)= 


3! 

x\ (3 - x)\ 


flY A 

1 芝卜叉 = 0, 1,2, 3, zero elsewhere. 


(b) f(x) — 6x(1 — x), 0 < x < ], zero elsewhere, 

( c ) /W = 2/x\ I < x < oo, zero elsewhere. 

!,90. Let f(x) = {!)’ ， 久 = 1 ， 2, 3, ... ， zero elsewhere, be the p,dX of the 
random variable X Find the m.g.f” the mean, and the variance of X 

邏 _ 番 

L91, For each of the following probability density functions, compute 
Pr - 2a < .T < ^ + la). _ 
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(a) f(x) — 6x(1 — x),0 < x < }, zero elsewhere. 

(b) f(x) = x = 1 ， 2, 3, ■ " ， zero elsewhere* 

1.92, If the variance of the random variable X exists, show that 

+ 

1*93, Let a random variable X of the continuous type have a p.d.f. f(x) 
whose graph is symmetric with respect to x — c. If the mean value of X 
exists, show that E{X) = c. 

Hint: Show that E{X — c) equals zero by writing E(X — c) as the sum 
of two integrals: one from —ootoc and the other from c to oo. In the first ， 
let y = c — x; and, in the second, z — x — c. Finally, use the symmetry 
condition f(c — y) = f(c + y) in the first. 

1.94, Let the random variable X have mean ^ standard deviation a, and 
m.gX M{t\ —h < i < h. Show that 

作 )“， < ㈣ t 1 ， 

» *• 

and 

遲 ■ 

=—At? < t < ha. 

1*95. Show that the m.gf of the random variable X having the p_d.f, f(x) — 

— I < jc < 2, zero elsewhere, is 

M ( t )^ 3t e ， / 爹0， 



1,96* Let A" be a random variable such that E[(X — h) 2 ] exists for all real b. 
Show that E[(X — b) 2 ] is a minimum when b — E{X). 

1.97. Let X denote a random variable for which E[(X — a) 2 ] exists. Give an 
example of a distribution of a discrete type such that this expectation is 

• zero. Such a distribution is called a degenerate distribution. 

1.98* Let X be a random variable such that K(t) — E(t x ) exists for 
all real values of / in a certain open interval that includes the point 
/ = 1. Show that is equal to the mth factorial moment 

E[X{X-+ 1)]. 

1.99, Let Z be a random variable. If m is a positive integer，the expectation 
E[{X — b)% if it exists, is called the mth moment of the distribution about 
the point b. Let the first, second，and third moments of the distribution 
about the point 7 be 3,11， and 15, respectively. Determine the mean fi of 
X ， and then find the first, second, and third moments of the distribution 
about the point fi. 
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1400， Let I be a random variable such that R(t) ^ E(e t{X ^ b) ) exists for 
-h<t<h, Ifmisa positive integer, show that /? m) (0) is equal to the mth 
moment of the distribution about the point b. 

1.101* Let Jf be a random variable with mean pi and variance (t 2 such that the 
third moment E[(X — p) 3 ] about the vertical line through fx exists. The value 
of the ratio E[{X — /i) 3 ]/<j 3 is often used as a measure of skewness. Graph 
each of the following probability density functions and show that this 
measure is negative, zero, and positive for these respective distributions 

(which are said to be skewed to the left, not skewed, and skewed to the 
right, respectively). 

(a) f{x) ^(x + 1)/2, — 1 <x<l * zero elsewhere. 

(b) — jy < x < zero elsewhere* 

( c ) ~ (I — x)/2, ^—l <x<l f zero elsewhere, 

* 

L102, Let be a random variable with mean /i and variance a 1 such that the 
fourth moment E[{X — /x) 4 ] about the vertical line through ft exists. The 
value of the ratio E[^X — /i) 4 ]/c 4 is often used as a measure of kurtosis. 
Graph each of the following probability density functions and show that 
this measure is smaller for the first distribution, 

( a ) A^) = 士， — 1 < x < l f zero elsewhere. 

(b) J{x) = 3(1 — x 2 )/4 y —I < x < U zero elsewhere, 

1-103, Let the random variable X have p.dX 

A x ) ^ p, 

=1 — 2/?， x = 0, 

— 0 elsewhere, 

?here0 <p Find the measure of kurtosis as a function of/?. Determine 

|ts value when ^ and p = ‘. Note that the kurtosis 

increases as p decreases. 

L104. Let ^(0 = In M{t), where M ⑺ is the of a distribution. Prove that 

^'(0) = fx and W(0) = /. 

1*105. Find the mean and the variance of the distribution that has the 
distribution function 

F(x) = 0， x <0, 


H 18 


0 < x <2, 
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1*106* Find the moments of the distribution that has m.g.f, M{t) = (1 一 0 _3 , 
/ < 1 , 

Hint: Find the MacLaurin's series for M{t). 

1,107. Let X be a random variable of the continuous type with pAS. f{x)^ 

which is positive provided Q < x < b < oo, and is equal to zero elsewhere. 
Show that 

E(X)= I [1 ^ F(x)] dx, 

where F(x) is the distribution function of X. 

1*108. Let X be a random variable of the discrete type with p.d_f f(x) that 

is positive on the non negative integers and is equal to zero elsewhere. Show 
that 

EW = f [t- F(x)l 

x — Q 

where F(x) is the distribution function of 

1*109* Let A" have the p.d. 「 /(x) = \ jk，x = l ， 2, ■ ， k ， zero elsewhere. Show 

that the m.g.f. is 

=1， / = 0* 

K110, Let X have the distribution function F(x) that is a mixture of the 
continuous and discrete types, namely 

F{x) = 0 ， jc < 0 ， 

= ^ ， 0 S x < 1 ， 

=1 ， l < x. 

Find /i = E{X) and a 2 = var (X). 

Hint: Determine that part of the p.di associated with each of the 
discrete and continuous types，and then sum for the discrete part and 
integrate for the continuous part. 

1*111. Consider k continuous-type distributions with the following charac¬ 
teristics: pAS, fi(x), mean pL h and variance of ， i = 1, 2, ... ， t If cv > 0 ， 

/ = 1 ， 2, ■ • ■, /c, and q + c*! + … + q = 1， show that the mean and the 
variance of the distribution having pAS. Cifi(x) + — ■ + c k f k {x) are 

* k 

ju = E Cffii and o- 2 = ^ + (从一 //) 2 ], respectively. 
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1.10 Chebysliev’s Inequality 

In this section we prove a theorem that enables us to find upper (or 
lower) bounds for certain probabilities. These bounds, however, are 
not necessarily close to the exact probabilities and, accordingly, we 
ordinarily do not use the theorem to approximate a probability* The 
principal uses of the theorem and a special case of it are in theoretical 
discussions in other chapters. 

Theorem 6. Let u{X) be a nonnegative function of the random 
variable X. If E[u{X)] exists, then, for every positive constant 


Pt[u(X)>c]< 


E[u(X)] 

* 

€ 


The proof is given when the random variable X is of the 
continuous type; but the proof can be adapted to the discrete case 
if we replace integrals by sums* Let ^ = (jc: u(x) > c) and let f(x) 
denote the p-d.f, of X. Then 


E[u{X)] = w(x)/(x) dx = 

oo 


u(x)f(x) dx + 

^ 4 * 


u(x)f(x) dx. 


Since each of the integrals in the extreme right-hand member of the 
preceding equation is nonnegative, the left-hand member is greater 
than or equal to either of them• In particular, 

i* 

> u(x)f(x) dx, 

* 

However ， ifxeA, then u(x) ^ c; accordingly, the right-hand member 
of the preceding inequality is not increased if we replace u(x) by c. Thus 


Since 


E[u(X)] > c f(x) dx. 


f(x) dx = Pr (X e ^) = Pr [u{X) ^ c], 

it follows that 

E[u(X)] > cPr [u(X) > c], 

M 

which is the desired result. 
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The preceding theorem is a generalization of an inequality that is 

often called Chebyshev’s inequality. This inequality will now be 
established- 

Theorem 7_ Chebyshev’s Inequality, Lei the random variable Xhave 

a 十你 ibutwn of probability about which we assume only that there is a 

⑽ rkmce o 2 . This，of course，implies that there is a mean Then 
for every k > 0 , 广 

Pr (|Jlf — //J > ka) < — , 

or, equivalently^ 

I 

Pr (\X ~fi\<k<r)>] -1 

Pro °f- In Theorem 6 take u(X) = (X~fi) 2 mdc = k 1 ^. Then we 
have 

^ « 

♦ m » 

Pr [(X - jif > k 2 a 2 ] < 雙 一 ") 21 _ . 

k 2 a 2 4 

Since the numerator of the right-hand member of the preceding 
inequality is the inequality may be written 

ii 

■ T 

_ W ♦.»!(# ■ 

Pr(\X^fi\>ka)^^ 

which is the desired result. Naturally, we would take the positive 
number 女 to be greater than I to have an inequality of interest. 

It is seen that the number 1/A^is an upper bound for the probability 
P r (|1 _ > h)* In the following example this upper bound and the 

exact value of the probability are compared in special instances. 

Example t Let X have the p.d.f. 


Ax) 




20 ’ 

= 0 elsewhere 


< x < 




Here /i = 0 and ff 2 = 1, If we have the exact probability 

/ * 3/2 , 


Pr (I JT — > k<r) — Pr f | Jf) > 


2 


- 3/2 2 \/ 3 . 


dx 




2 
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By Chebyshev's inequality, the preceding probability has the upper bound 
1/A: 2 = I，Since l — ^/3/2 - 0.134, approximately, the exact probability in 
this case is considerably less than the upper bound If we take A: — 2^ we have 
the exact probability Ft (\X - fi\> 2a) = Pr (\X\ > 2) = 0_ This again is 
considerably less than the upper bound l/k 1 = \ provided by Chebyshev's 
inequality. 

In each of the instances in the preceding example, the probability 
Pr (\X — fi\> k<r) and its upper bound l/k 2 differ considerably. This 
suggests that this inequality might be made sharper. However, if we 
want an inequality that holds for every k > 0 and holds for all random 
variables having finite variance, such an improvement is impossible, as 
is shown by the following example. 

Example 2. Let the random variable X of the discrete type have 
probabilities | at the points x — 一 1 ， 0, 1 ， respectively. Here pi = 0 and 
^ = 1 l(k = 2, then l/k 2 = ^ and Pr (\X ^fi\> ka) = Pr (\X\ > 1) = |. That 
is, the probability Pr (|A" — fi\ > k(r) here attains the upper bound l/k 2 — 
Hence the inequality cannot be improved without further assumptions about 
the distribution of X, . 

EXERCISES 

-， ” 1 I ^ 

* * 

1.112. Let be a random variable with mean fi and let E[(X — /i) 2 *] exist. 

Show, with d>0, that Pr £[(%- #) ， / 户 . This is 

essentially Chebyshev’s inequality when k = L The fact that this holds for 
all k = I ， 2, 3, …， when those (2A:)th moments exist，usually provides a 
much smaHer upper bound for Pr (\X — ft\> d) than does Chebyshev^ 
result, 

1.113, Let Xbca random variable such that Pr (Jf < 0) = 0 and let \i = E(X) 

,exist* Show that Pr (X > 2fi) < 

1*114, If X is a random variable such that E{X) — 3 and E^) = 13* use 
Chebyshev^s inequality to determine a lower bound for the probability 
Pr(^2 < 8). 

1.115. Let X be a random variable with m.g.f. M(t) % —h<i<L Prove that 

Pr(X> a) < 0 < f < h ， 

and that 

Pr {X < a) < — A < / < 0. 

Hint: Let u(x) — and c = e ta in Theorem 6* Note. These results imply 
that Pr {X > a) and Pr (X < a) are less than the respective greatest lower 
bounds for when 0 < t < h and when —h < t <(k 
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1.116* The m.gS. of X exists for all real values of t and is given by 

^(0 ~ 2f 3 ^ ^ 0, M(0) = 1. 

Use the results of the preceding exercise to show that Pr (X > 1) = 0 and 
Pr {X < —1) = 0. Note that here h is infinite. 


ADDITIONAL EXERCISES 

1*117，Players A and B play a sequence of independent games. Player A 
throws a die first and wins on a “six.” If he fails, B throws and wins on a 
“five” or “sbu” If he fails, A throws again and wins on a “fcmr ，，， “five,” 

or six,” And so on. Find the probability of each player winning the 
sequence, 

1*118. Let X be the number of gallons of ice cream that is requested at a 
certain store on a hot summer day. Let us assume that the of X is 
— 12x( 1000 — x) 2 /10 12 , 0 < x < 1000, zero elsewhere. How many 
gallons of icc cream should the store have on hand each of these days, so 
that the probability of exhausting its supply on a particular day is 0,05? 

hi 19* Find the 25th percentile of the distribution having p,d,f f(x) = |x|/4, 
—2 < x < 2, zero elsewhere, 

: * *' - 黍 

1*120. Let A u A 2 , A 3 be independent events with probabilities U, !, 
respectively. Compute Pr(^ju^ 2 u A 3 ). 

1*121* From a bowl containing 5 red，3 white, and 7 blue chips, select 4 at 
random and without replacement. Compute the conditional probability of 

1 red, 0 white, and 3 blue chips, given that there are at least 3 blue chips 
in this sample of 4 chips* 

1* 122， Let the three independent events A, B, and C be such that 
P(^) — P{B) — P{C) = Find P[(A* n B*) u C].. 

1.123，Person A tosses a coin and then person B rolls a die. This is repeated 
independently until a head or one of the numbers 1 ， 2,3,4 appears, at which 
time the game is stopped. Person A wins with the head and B wins with one 
of the numbers 1 ， 2, 3, 4. Compute the probability that A wins the game, 

1*124. Find the mean and variance of the random variable X having 
distribution function 

F{x) = 0 ， x <0 y 

/ 钃 ， 



V 

<1 

o 


JCI4 

= 
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a. 

1 <x <2, 

2 <x. 

I-125. Let be a random variable having distribution function 

F(x) — 0, x < 0, 

4 

= lx 2 , 0 < x <j, 

=i - 2(1 — x > 2 ， 

=1 ， I < X, 

Find Pr (i < A" < |) and the variance of the distribution. 

Hint: Note that there is a step in 尸 (x), 

1*126, Bowl l contains 7 red and 3 white chips and bowl II has 4 red and 6 
white chips. Two chips are selected at random and without replacement 
from ! and transferred to II. Three chips are then selected at random and 
without replacement from IL 

(a) What is the probability that all three are white? 

(b) Given that three white chips are selected from II, what is the 
conditional probability that two white chips were transferred from I? 

1.127* A bowl contains ten chips numbered 1, 2” • • ， 10, respectively. Five 
chips are drawn at random, one at a time，and without replacement. What 
is the probability that exactly two even-numbered chips are drawn and they 
occur on even-numbered draws? 

1.128. Let £(JT) = — — ， r = 1 ， 2, 3, ■., ■ Find the series representation for 

r + 1 

the m.g.f. of X. Sum this series* 

1J2^* Let X have the pAS.J{x) = 2x, 0 < x < 1, zero elsewhere. Compute 
the probability that X is at least | given that X is at least 

1,130. Divide a line segment into two parts by selecting a point at random. 
Find the probability that the larger segment is at least three times the 
shorten Assume a uniform distribution. 

L1S1. Three chips are selected at random and without replacement from a 
bowl containing 5 white, 4 black, and 7 red chips. Find the probability that 
these three chips are alike in color* . 

1*132. Factories A, B, and C produce, respectively, 20, 30, and 50% of a 
certain company’s output. The items produced in A, B, and C are 1, 2, and 
3 percent defective, respectively. We observe one item from the company's 
output at random and find it defective. What is the conditional probability 
that the item was from A? 


J 


= 


1， 
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1.133- The probabilities that the independent events A, B, and C will occur 

are ! ， I， and 夂 What is the probability that at least one of the three events 
will happen? 

1.134. A person bets 1 dollar to b dollars that he can draw two cards from 
an ordinary deck without replacement and that they will be of the same suit. 
Find b so that the bet will be fair. 

1*135. A bowl contains 6 chips: 4 are red and 2 are white. Three chips are 
selected at random and without replacement; then a coin is tossed a number 
of independent times that is equal to the number of red chips in this sample 
of 3, For example，if we have 2 red and 1 white，the coin is tossed twice. 
Given that one head results, compute the conditional probability that the 
sample contains I red and 2 white. 




CHAPTER 


Multivariate 

Distributions 


2*1 Distributions of Two Random Variables 

We begin the discussion of two random variables with the following 
example* A coin is to be tossed three times and our interest is in 
the ordered number pair (number of H，s on first two tosses, number 
of H，s on all three tosses), where H and T represent, respectively, heads 
and tails. Thus the sample space is ^ = {c : c = c h i = 1, 2,… ， ， 8 }， 
where c, is TTT, is TTH, is THT, q is HTT, c 5 is THH, c 6 is 
HTH, c 1 is HHT, and c@ is HHH. Let and X 2 be two functions 
such that 不 (c 〖） = X } (c 2 ) = 0, X x {c,) = X,{c A ) = ^ (c 5 ) = X } (c t ) ^ I, 
X\{cj) — X\{c^) = 2\ and = 0 ， X 2 (c 2 ) = = Xi{c^) = I, 

^ 2 (^ 5 ) = X 2 (c 6 ) = X 2 (c 7 ) = 2, A^Cg) = 3. Thus X x and X 2 are 
real-valued functions defined on the sample space 贫 ， which take us 
from that sample space to the space of ordered number pairs 

^ = {(0, 0 ), ( 0 , 1 ), ( 1 , 1 ), ( 1 , 2 ), ( 2 , 2 ), ( 2 , 3)}. 

Thus Jfj and X 2 are two random variables defined on the space 贫， 
and，in this example, the space of these random variables is the two* 
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dimensional set ^ given immediately above* We now formulate the 
definition of the space of two random variables, 會岸』塔 

Definition 1 , Given a random experiment with a sample space 贫 , 
Consider two random variables X } and X 2y which assign to each 
element c of ^ one and only one ordered pair of numbers X y (c) = 
X 2 {c) — x 2 . The space of X x and X 2 is the set of ordered pairs 

^ = {C^* ， 叉 2): 叉 ] = 不 (C) ， 文 2 = ^2(C)，C € X}. 

Let be the space associated with the two random variables and 
X 2 and let ^4 be a subset of 说 ， As in the case of one random variable, 
we shall speak of the event A . We wish to define the probability of the 
event A, which we denote by Pr [{X U X 2 )€ A\ Take C = {c:cg^ and 
[X^c), X 2 (c)]e A}, where ^ is the sample space. We then define 
Pr {(Xi y X 2 ) e A] ^ P{C)^ where P is the probability set function 
defined for subsets C of 縈 . Here again we could denote Pr [{A",, X 2 ) € A] 
by the probability set function P X]tXl (A); but, with our previous 
convention, we simply write 

P(A) = Pr [(X { , X 2 ) e A). 

Again it is important to observe that this function is a probability set 
function defined for subsets A of the space si' 

Let us return to the example in our discussion of two random 
variables. Consider the subset A of where A = {(1 ， 1) ，（ 1, 2)}. 
To compute Pr [(X ]y X 2 ) e A]~ P(A), we must include as elements of C 
all outcomes in ^ for which the random variables and X 2 take values 
(jvj 5 x 2 ) which are elements of A. Now X^) = I, ^ 2 ^ 3 ) = !, 
(q) = 1， and Xj^ca) — 1- Also, %(c 5 ) = 1 ， A^(C 5 ) = 2 ， X^c^) — 1, 
and X 2 (c 6 ) - 2. Thus P(A) - Pr [(X, , X 2 ) eA] = P(Q, where 
C — {c 3 , c 4f c 5 , or c 6 }. Suppose that our probability set function P(C) 
assigns a probability of ^ to each of the eight elements of This 
assignment seems reasonable if 尸 (T) = P(H) = \ and the tosses are 
independent. For illustration, 

P({c,}) = Pr(TTT) = (I)(I)(i) = l. 

Then P{A\ which can be written as Pr (^, = 1, X 2 = 1 or 2)，is equal 
to | = 5 * It is left for the reader to show that we can tabulate the 
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probability，which is then assigned to each of the elements of j/, with 
the following result: 


(:1 ， A) 

(0,0) (0,1) (U) (1,2) (2,2) (2,3) 

Pr l(X^X 2 )^( Xl ,x 2 )] j 

* 1 2-2 1 1 
* WP _ 

8 8 S g 8 8 


This table depicts the distribution of probability over the elements of 
j/，the space of the random variables Xi and X 2 . 

Again in statistics we are more interested in the space of two 
random variables，say X and F，than that of , Moreover, the notion 
of the p.d*f_ of one random variable X can be extended to the notion 
of the p.dX of two or more random variables. Under certain 
restrictions on the space ^ and the function/> 0 on W (restrictions 
that will not be enumerated here)，we say that the two random variables 
X and Y are of the discrete type or of the continuous type, and have 
a distribution of that type, according as the probability set function 

A a can be expressed as 

* 

P(A) - Pr [(X, Y)e A] = £/( 太，少)， 

or as 

P(A) = Pr [(X, Y)e A]^ | f(x^ y) dx dy. 

V V 

A 

In either case/is called the p.dX of the two random variables X and 
y. Of necessity,= I in each case. 

We may extend the definition of a p*d,f, f(x^ y) over the entire 
a: 少 -plane by using zero elsewhere. We shall do this consistently so that 
tedious，repetitious references to the space can be avoided. Once this 
is done，we replace 


/(x, y) dx dy 

Vjj/ * 


by 



f(x, y) dx dy. 


Similarly，after extending the definition of a p,dX of the discrete type ， 
we replace 

11 /(^ y) by Z Z 

j/ y x 

In accordance with this convention (of extending the definition of 
a p.d,f‘)，it is seen that a point function /, whether in one or two 
variables，essentially satisfies the conditions of being a p*df. if (a)/ 
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is defined and is nonnegative for all real values of its argument(s) and 
if (b) its integral [for the continuous type of random variable(s)】，or 
its sum [for the discrete type of random variable(s)] over all real values 
of its arguments(s) is L 

Finally, if a p.df* in one or more variables is explicitly defined，we 
can see by inspection whether the random variables are of the con¬ 
tinuous or discrete type. For example, it seems obvious that the p*d.f. 

9 

y ) ^ » a : = 1 ，2, 3, … • ， j / = 1，2, 3, … . ， 


= 0 elsewhere, 

is a p-dX of two discrete-type random variables and Y, whereas the 

p,dX 

fix,y) ^ 4xye^^^ f 0<x< oo, 0 <y < oo, 

► 

■ 

— 0 elsewhere, 

is clearly a p_dX of two continuous-type random variables X and Y, 
In such cases it seems unnecessary to specify which of the two simpler 
types of random variables is under consideration. 


Example L Let 

• i 識 

A x >y)^ o<x< i, 0< 少 <1 ， 


= 0 elsewhere, 

* 囑 

be the p-d.f, of two random variables X and Y y which must be of the 
continuous type. We have，for instance, 


Pr(0<JT<|,|< K<2) = 


p3/4 

! y) dx dy 

^1/3 ^0 



广 3/4 

i 6^y dxdy + 


广 2 广 3/4 

i 0 dx dy 

Jo 


Note that this probability is the volume under the surface f(x t y) = 6x 2 y and 
above the rectangular set {(x, y):0 < x <^\<y < 1} in thexy-plane. 

Let the random variables X and Y have the probability set function 
P{A\ where Aka two-dimensional set. If A is the unbounded set 
{(« ， v):u <x,v <y} y where x and y are real numbers, we have 

P(A)^Pt[(X 7 Y)eA] = Pt (X^x, Y < y). 
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This function of the point (x, y) is called the distribution function of X 
and F and is denoted by 

^y)-Pr(JT<jc 5 Y<y). 

If A" and Y are random variables of the continuous type that have p.d.f. 
y), then 

，少 AX 

F(x,y)= i f(u, v) du dv. 


Accordingly, at points of continuity of f(x^ y\ we have 

8 2 F(x, y) 


dxdy 




It is left as an exercise to show，in every case, that 
Vr{a<X<b,c<Y<d) = Fib,d)-F{b,c)-F{a,d)^ F(a, cl 


for all real constants a < b，c < cL 

Consider next an experiment in which a person chooses 
at random a point (X, Y) from the unit square ^ = 

{(x, y):0 < x< 1,0 < y < I} ( Suppose that our interest is not in X or 
in Y but in Z = Z + Y. Once a suitable probability model has been 
adopted, we shall see how to find the p.dX of Z‘ To be specific, let the 
nature of the random experiment be such that it is reasonable to assume 
that the distribution of probability over the unit square is uniform. 

Then the pAS. of X and Y may be written 

* • 

f(x,y) = U 0<x<t, 0<y<\, 

= 0 elsewhere, 


and this describes the probability model Now let the distribution 
function of Z be denoted by G(z) Pr (X + Y < z). Then 


G(z) — 0, z < 0, 






2 


dy dx = 


z 

2 


dy dx 


0 < z < \, 


(2 — z) 
~ 2 ~ 


1 ^ z < 2, 
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Since G'{z) exists for all values of 2 , the p.d.f. of Z may then be written 

g {^)= 尽 0 < z < u 


：= 


2~ z, 1 < z <2 f 


0 


elsewhere. 


It is clear that a different choice of the p.dX f(x, y) that describes 
the probability model will, in general, lead to a different p.dX of 
Z. 

Lctf(x u x 2 ) be the jxdX of two random variables^ and X 2 * From 
this point on, for emphasis and clarity，we shall call a p,d,f, or a 
distribution function a joint p.d.f. or a joint distribution function when 
more than one random variable is involved. Thus f(x u x 2 ) is the joint 
p.dX of the random variables X t and X 2 . Consider the event 
a < a < b. This event can occur when and only when the event 

a < ~oo < X 2 < 00 occurs; that is, the two events are 

equivalent, so that they have the same probability. But the probability 
of the latter event has been defined and is given by 

- M 


Pr (a < X t < b y —oo<X 2 <oo) 


f(x u X2)dx 2 dx i 


for the continuous case, and by 

Pr (a < < b u —oo<X 2 < 00)= 

17 < JCJ < ^ X2 

for the discrete case. Now each of 


/(X|, x 2 ) dx 2 and [ f(x { , x 2 ) 

文 2 


is a function of alone，say Thus，for every a < A，we have 


Pr (a < X x < b) 


dx x (continuous case), 


=X (discrete case), 

a < x\ < b 

so that /j(x ( ) is the p,d,f, of alone. Since fi(x t ) is found by 
summing (or integrating) the joint p.dX Jlx u x 2 ) over all x 2 for a 
fixed x u we can think of recording this sum in the “margin” of the 
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x_x 2 -plane. Accordingly, (x } ) is called the marginal p.d.f. of AV In 
like manner 


^ x 2 ) dx x 


(continuous case), 


= X ^2) 

ti 


(discrete case). 


is called the marginal p.dS. of X 2 , 

Example 2, Consider a random experiment that consists of drawing at 
random one chip from a bowl containing 10 chips of the same shape and size. 
Each chip has an ordered pair of numbers on it: one with (i ， 1), one with (2,1 )， 
two with (3, 1)，one with (1 ， 2)，two with (2, 2), and three with (3,2). Let the 
random variables Z| andX 2 be defined as the respective first and second values 
of the ordered pair. Thus the joint pAS. f(x^ x 2 ) of Z, and X 2 can be given 
by the following table, with f(x u x 2 ) equal to zero elsewhere* 





A 

1 

2 

3 

flUi) 

T 

i 

10 

p 

2 

4 

1 

To 

To 

lo 

2 

i 

■ . r 

10 

2 

To 

3 

To 

6 

To 

f\M 

2 

To 

J 

To 

5 

To 

嘗 


The joint probabilities have been summed in each row and each column and 
these sums recorded in the margins to give the marginal probability density 
functions of X } and X 2 , respectively. Note that it is not necessary to have a 
formula for/( jcj, x 2 ) to do this. 

Example 3. Let X x and X 2 have the joint p*d.f, 

f(x u x 2 ) - x l + x 2f 0 < x, < I f 0 < jc 3 < 1, 

= 0 elsewhere* 

The marginal p,dT. of X } is 

*1 

/i(^i)= I (x t + x 2 ) dx 2 = x l + 5 , 0 <X! < 1, 

^0 


zero elsewhere, and the marginal p.d.f. of X 2 is 

/ * 


AM 


(jci + x 2 ) dx x ^ ^ + x 2 , 0<x 2 <U 


^0 
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zero elsewhere, A probability like Pr (Xi < can be computed from either 
or f(x^X 2 ) because 


V2 


0 


r l/2 


f(x u x 2 ) dx 2 dx t 


o 


fiMdx l 




However to find a probability like Pr + X 2 < 1), we must use the joint 
p*d*f. /(jc ly x 2 ) as follows: 




(Xi + x 2 ) dx 2 dx { 


^0 外 


^0 


Xi(l — x t ) + 


(1 — A ) 2 
~2 




dx\ 


2_2 






This latter probability is the volume under the surface f{x u x 2 ) = x t + x 2 
above the set {(jc, ， x 2 ) : 0 < Xj, 0 < x 2 ^ + x 2 < 1}- 


EXERCISES 


2,1, Let f(x x , x 2 ) — 4x t x 29 0 <x { <1, 0 < x 2 < I, zero elsewhere, be the 
pAS. of X { and X 2 . Find Pr(0 < X, <^\< X 2 < 1), Pr (X, = X 2 \ 
Pr {X x < X 2 ), and Pr (X { < X 2 ). 

Hint: Recall that Pr (X l = X 2 ) would be the volume under the surface 
/(X|, JC 2 ) = 4X|X 2 and above the line segment 0 < = x 2 < l in the 

-plane, 

2X Let A y ^ {(x,y):x<2,y<4} $ A 2 = {(x,y):x<2 y y <l}, A 3 = 
{(X ， y):x < 0, y < 4}, and A a = {(jc ， y) : x < 0, y < 1} be subsets of the 
space of two random variables X and Y, which is the entire 
two-dimensional plane. If P(A{) = P(A 2 ) — P(A 3 ) — |, and 尸 (」 4 ) = | s 

find P(A s ) 3 where A 5 ― {(jc ，^): 0 < x < 2, I < y < 4}. 

2.3* Let F{x^y) be the distribution function of X and Y. Show that 
Pr (a <X< b,c < T< d) — Fifi’d) — F(b ， c) _ F{a r d) + F(a y c), for all 
real constants a < b 3 c <d. 

2,4， Show that the function F{x, y) that is equal to 1 provided that x + 2y> 1 T 
and that is equal to zero provided that x + 2^ < I, cannot be a distribution 
function of two random variables. 

Hint: Find four numbers a < b, c < so that 

F(b, d) — f(a 9 d)— F(b y c) + i^(a ， c) 

is less than zero. 

2.5. Given that the nonnegative function g(x) has the property that 

g(x) dx = 1* 
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Show that 

/(Jf h x 2 ) = [2^(^ + x^yin.Jx] + xl), 0< X[ < oo, 0 < x 2 < oo, 

zero elsewhere, satisfies the conditions of being a p*d*f. of two 
continuous-type random variables Xi and X 2 . 

// 如 ： Use polar coordinates, 

2 表 Lf f(x, y) 0 < x < oo, 0 < j ； < oo, zero elsewhere, be the 

p.dX of ^ and K Then if Z ^ X + Y, compute Pr (Z < 0), Pr (Z < 6) 

aT f ， more generally ， Pr(Z S z), for 0<2<oo. What is the p,dX of 

* 


2.7. Let X and Y have the p,d.f /(x y) ^ K 0 < .v < K 0 < ^ < K zero 
elsewhere. Find the pdf. of the product Z = XV. 

2Mt !： et 13 cards be taken，at random and without replacement, from an 
ordinary deck of playing cards. If X is the number of spades in these 13 
cards, find the p.d.f of X. If, in addition, Y is the number of hearts in these 
13 cards, find the probability Pr = 2, y = 5). What is the joint p.dX of 


2 9 ； ^ et the random variables X, and JT 2 have the joint pA.L described as 
follows: 


(A ， A) 

f(Xi,X 2 ) 


°> (0,1) (0,2) (1,0) (!J) (1,2) 


\2 


TI 


雩 j 


12 


n 


T5 


^ndf(xi ， x 2 ) is equal to zero elsewhere. 

(a) Write these probabilities in a rectangular array as in Example 2 
recording each marginal p.df. in the “margins.” 

(b) What is Pr (X { + X 2 ^ 1)? 


2*10. Let ^ and have the joint p.6SJ(x u x 2 ) ^ \5x]x 2 , 0 < x } < x 2 < 1, 
zero elsewhere. Find each marginal p.dX and compute Pr + X 2 < 1). 

Hint: Graph the space of X f and X 2 and carefully choose the limits 
of integration in determining each marginal p.dX 

2*2 Conditional Distributions and Expectations 

We shall now discuss the notion of a conditional p.d.f. Let 
不 and X 2 denote random variables of the discrete type which 
have the joint pAS. f(x u x 2 ) which is positive on and is 
zero elsewhere* Let and f 2 (x 2 ) denote, respectively, the 

marginal probability density functions of X x and X 2 . Take to be 
the set Ai = {(x u x 2 ) : Xi — — oo < x 2 < oo}, where is such 

t at P(A } ) Pt (Xi = xj) > 0, and take A 2 to be the set 
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A 2 = {(x,, x 2 ) : —co < x t < oo, x 2 = X 2 }. Then, by definition, the 
conditional probability of the event A 2 , given the event A u is 


P{A 1 \A l ) 


P(Aj n A2) Pr (JTj — x\ y X2 = x f i) y«，O 


P(A X ) 


Pr(^=xJ) 




That is，if (x [y x 2 ) is any point at which (x, ) > 0, the conditional 
probability that X 2 = x 2 , given that X { = x, ， x 2 )/f i (xi). With x t 
held fast，and with/[(x!) > 0, this function of x 2 satisfies the conditions 
of being a p.d.f. of a discrete type of random variable X 2 because 
A x i^ x i)!f\(^]) is nonnegative and 


E 


i 


_ _ 

fiM f\(x\) 




f\ ) 


We now define the symbol / 2JI (^ 2 l^i) by the relation 






> 0 , 


and we call the conditional p.d.f. of the discrete type of 

random variable X 2 , given that the discrete type of random variable 
^"1 = In a similar manner we define the symbol f x , 2 (x t \x 2 ) by the 
relation 


/112 ㈤ A) 


A^u ^ 2 ) 

A( X 2) 


f 2 (x 2 ) > 0, 


and we call the conditional p.d.f. of the discrete type of 

random variable ^iven that the discrete type of random variable 

Now let X\ and X 2 denote random variables of the continuous type 
that have the joint p-d.f-^(x_ ， x 2 ) and the marginal probability density 
functions fi(xi) andf 2 (x 2 ), respectively* We shall use the results of the 
preceding paragraph to motivate a definition of a conditional p.dX of 
a continuous type of random variable. When fi(Xi) > 0, we define the 
symbol ^1 (x 2 \x t ) by the relation 




私， A ) 

fiM 


In this relation, is to be thought of as having a fixed (but any fixed) 
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value for which /,(Xj) > 0. It is evident that/ 2 n(jt 2 |X|) is nonnegative 
and that 


dx 2 




f(Xi > x 2 ) 


dx 2 




f(x } 4 x 2 ) dx 2 


fi (^i) 




i 

That has the properties of a p,d.f‘ of one continuous type 

of random variable. It is called the conditionalpA / of the continuous 
type of random variable X l9 given that the continuous type of random 
variable X x has the value . When f 2 (x 2 ) > 0, the conditional p.d.f of 
the continuous type of random variable X x , given that the continuous 
type of random variable X 2 has the value x 2 , is defined by 

fmixilxi) = / 〒■) , f 2 (x 2 ) > 0 . 

Since each of f 2 i\(x 2 \x l ) and/” 2 (jC||jc 2 ) is a p*d.f. of one random 
variable (whether of the discrete or the continuous type), each has all 
the properties of such a p.dX Thus we can compute probabilities and 
mathematical expectations. If the random variables are of the 
continuous type，the probability 

产 b 


Pr (a < X 2 < b\X f = x } ) 




y^ii(x 2 |xi) dx 2 


is called “the conditional probability that a <X 2 < given that 
X x =Xp” If there is no ambiguity，this be written in the 
form Pr (a < X 2 < b\X\). Similarly, the conditional probability that 
c < <d, given X 2 = x 2 , is 


Pr (c < X } < d\X 2 — x 2 ) 






ft\2(Xl\X Z ) dXy 


If u{X 2 ) is a function of X 2i the expectation 

广 OQ 




dx 


2 


is called the conditional expectation of 1 /( 尤 2 )， given that X x =文卜 
In particular, if they do exist，then E(X 2 \x\) is the mean and 
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E{[X 2 — E(X 2 \x ] )] 2 \x l } is the variance of the conditional distribution of 
X 2 , given X t — x,, which can be written more simply as var (X 2 \x l ). It 
is convenient to refer to these as the “conditional mean” and the 
“conditional variance” of X 2 , given X x = Of course, we have 

var (X 2 \ Xi ) = E{X\\x x ) - [E(X 2 \ Xl )] 2 

from an earlier result* In like manner, the conditional expectation of 
u(Xi), given X 2 = x lf is given by 


五 K 足 )1 文 2] 




w ⑹ /；| 2 (雄 2 池_ 


With random variables of the discrete type, these conditional 
probabilities and conditional expectations are computed by using 
summation instead of integration. An illustrative example follows. 

Jf, Let X x and X 2 have the joint p-d.f. 

f{x u x 2 ) = 2 ， 0 < X| < x 2 < I, 

= 0 elsewhere- 

Then the marginal probability density functions are, respectively. 


immation 

Example 


fiM 




2 dx 2 — 2(! — X|) 3 0 < x l < \ 




— 


0 elsewhere, 


and 


^ 2 (^ 2 ) = I 2 dxi = 2x:, 0 < X】 < 1 


— 0 elsewhere. 

The conditional p.d.f. of X u given X 2 = x 2 , 0 < x 2 < I, is 

2 1 


fm(xi\x 2 ) 


0 < jc 1 < x 2 


2^2 X 2 

= 0 elsewhere. 

Here the conditional mean and conditional variance ofX,, given X 2 — x 2i are, 
respectively, 


舰 ㈨ ) 




AM 雄 2) 办 1 


CO 


r x 2 


JC| 


0 



dx { 


x 2 


0 < x 2 < 1 
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and 


var (X } \x 2 ) 


r x 2 


J o 

^2 

II ， 


JC| 


又 2 





文 2 




0 < x 2 < K 


Finally s we shall compare the values of 

Pr (0 < X x < \\X 2 - I) and 


We have 


1/2 


Pr (0 < X, < D- 

m 


Vr(0<X ] <^\X 2 ^l)= I MxSdx, 

’0 




(?) ~ fl 


D 


but 


Pr(0<^<I) 


/ * 1/2 


/H/2 






2(1 — X|) dx { 


4 


0 


-fl 

Since is a function of x u then E(X 2 \X\) is a random 

variable with its own distribution, mean, and variance. Let us consider 
the following illustration of this* 

Example 2. Let and X 2 have the joint p.dX 

f(x u x 2 ) ^ 6 x 2 , 0 <x 2 <x { < 1 , 

— 0 elsewhere* 

Then the marginal p.d.f. of is 


r x \ 


f\M 


6x 2 dx 2 = - 0 < 1’ 


以0 


zero elsewhere. The conditional p,dX of X 2 , given X l = jcj, is 

f r \ \ 6x2 2x2 ^ 

f2\l( x 2\ x \) - ^2 " 0 <X 2 < 

zero elsewhere, where 0 < x, < L The conditional mean of JSf 2 , given — 
is 

/ *jcj 




X 2 


^0 



dx 2 = 0 < X| < L 


Now £(^ 2 l^i ) = 2X/3 is a random variable, say Y. The distribution function 
of K = 2^/3 is 

G(y) ^Pv(Ysy) = Pt(x l < 譬)， 0^y<y 


From the p.d.f. /10c,)，we have 


G(y) 


3x] dx x 




孚 ，- 0 ^<i 


^0 
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Of course, G(y) = 0^ if j < 0, and G(y) — 1, if ! < y The p.d.f” mean, and 
variance o( Y = 2XJ3 are 

, 、 81少 2 2 

g(y) = 了， 0<y<-, 

zero elsewhere. 


产 2/3 


E(Y) 


y 


Slf 


dy 


and 


var(y) 


~/3 


WU - 


^0 


f{Z t) dy 4-60' 


Since the marginal p.d.f. of X 2 is 

/*! 


/2( 又 2) 


6x 2 dx l — 6^(1 一 A )， 0 < x 2 < 1, 


X2 


zero elsewhere, it is easy to show that E(X 2 ) = | and var (X 2 )= 去 . That is, here 

； E{Y)^ E{E{X 2 \X,)) = E(X 2 ) 

and 

var (y) = var [E{X 2 \X V )] < var {X 2 ). 

Example 2 is excellent, as it provides us with the opportunity to 
apply many of these new definitions as well as review the distribution 
function technique for finding the distribution of a function of a ran¬ 
dom variable，namely Y ― 2^/3* Moreover, the two observations at 
the end of Example 2 are no accident because it is true, in general, that 

E[E(X 2 \X l )] — E{X 2 ) and var [E(X 2 \X ] )] < var (X 2 )^ 

To prove these two facts，we must first comment on the expectation 
of a function of two random variables, say u(X l 5 X 2 \ We do this for 
the continuous case, but the argument holds in the discrete case with 
summations replacing integrals. Of course, Y = u{X u X 2 ) is a random 
variable and has a p.d.f” say g(y\ and 


E(Y) 


yg(y) dy. 


However, as before，it can be proved (Section 4,7) that E(Y) equals 


/^OQ 


E[u{X u X 2 )] 




u{x x , X 2 )f(x u x 2 ) dx^ dx 2 . 
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We call E[u{X u X 2 )] the expectation (mathematical expectation or 
expected value) of u{X x ^ X 2 \ and it can be shown to be a linear 
operator as in the one-variable case. We also note that the expected 
value of X 2 can be found in two ways: 

v 广 00 广 00 产 00 

^(^2) = ^2) dx2 — I 

W —00 *^*-00 

the latter single integral being obtained from the double integral by 
integrating on x ( first. 

Example 1 Let X { and X 2 have the p.dX 

J(x } , x 2 ) = 8jc,x 2 , 0 < jc, < jc 2 < I, 

— 0 elsewhere. 


Then 


E(X^ 2 ) 


=z 



x l xlf{x l ,x 2 )dx l dx 2 



太 2 


Sx^xj dx\ dx 2 


X 2 dx 2 


T\ 


0 


In addition, 


财 2 ) 






x 2 (^x t x 2 ) dx l dx 2 


0 *^o 


Since X 2 has the p,dX f 2 (x 2 ) = 4x^, 0 < x 2 < 1, zero elsewhere, the latter 
expectation can be found by 


e(x 2 ) 


/*\ 


0 


A ㈣ ） dx ： 


4 


Finally, 

■ _ 

£(7^ X\ + 5X 2 ) - lE{X y X\) + 5E(X 2 ) 

- (7)(^) + (5)(|) = f. 

We begin the proof of £[£(^ 2 (^)] = E(X 2 ) and var [E(X 1 \X l )]< 
var (X 2 ) by noting that 

f*<30 /*QO 

= x 2 J{x } , x 2 ) dx 2 dxi 
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产 00 




/( 文 I ,々) 


文 2 


Mx X ) 


dx 2 f\(x } )dx x 





E{X 2 \x l )f l {x l ) dx l 


=E[E{X 2 \X x )\, 


which is the first result. Consider next，with fi 2 = 芯 (H 
var (X 2 ) = E[(X 2 - fi 2 f\ 

= E{[X 2 - E{X 2 \X,) + EiXM - fi 2 ] 2 } 

=E{[X 2 - Emx,)] 2 } + £{[£(X 2 |X,) - fi 2 Y} 

+ 2 舰—柳翊取 H 抡]}_ 


We shall show that the last term of the right-hand member of the 
immediately preceding equation is zero. It is equal to 


产 oo 



poo . 

[x 2 — E(X 2 \x l )][E(X 2 \x l ) - fi 2 ]f(x { , x 2 ) dx 2 dx x 


= 2 [E(X 2 \x t ) - n 2 ] 





, X 2 ) 


Mxi) 


dx 2 


But E(X 1 \x ] ) is the conditional mean of X 2y given X\ = x { . Since the 
expression in the inner braces is equal to 

E(X 2 \x t ) ^ EiX^x,) = 0, 

the double integral is equal to zero. Accordingly, we have 

var (X 2 ) = E{[X 2 - EiX^)} 1 } + £{[£( 幻不） — n 2 f). 

The first term in the right-hand member of this equation is nonnegative 
because it is the expected value of a nonnegative function, namely 
[Z 2 — E(X 2 \X l )] 2 . Since E[E(X 2 \X l )] ― /4 2 , the second term will be the 
var [ 五 (JT 2 |A^)]. Hence we have 

var(Jr 2 )^var[£(X 2 |X,)], 

which completes the proof. 

Intuitively, this result could have this useful interpretation. Both 
the random variables X 2 and E(X 2 \Xi) have the same mean /4 2 . If we 
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did not know " 2 , we could use either of the two random variables to 
guess at the unknown ^ Since, however，var {X 2 ) > var [E(X 2 \^-\ )] we 
would put more reliance in 五 (X 2 丨 as a guess. That is, if we 
observe the pair ， Z 2 ) to be (x u x 2 \ we would prefer to use 
to x 2 as a guess at the unknown ^ When studying the use of sufficient 
statistics in estimation in Chapter 7, we make use of this famous result, 
attributed to C. R. Rao and David Blackwell. 


EXERCISES 

2J1* Let X x and X 2 have the joint p*d,f, j{x u x 2 ) — x { + x 2 , 0 < x t < I, 

0 < x 2 < 1, zero elsewhere. Find the conditional mean and variance of X 2 ^ 
given X x = 0 < < 1. 

2-12. — c x x x jx{^ 0 < < x 2 , 0 < x 2 < U zero elsewhere, and 

— ^ 2 -^ 2 > 0 < x 2 < I, zero elsewhere, denote, respectively 5 the 
conditional p.dX of %, given X 2 — x 2 , and the marginal p.d.f- of JT 2 - 
Determine: 

(a) The constants c, and c 2 . 

(b) The joint p,d,f, of and X 2 ^ 

a (c) Pr (I < AV< =f). 

^ (d) Pr (i < JT, < ly 

2J3* Let y(xj, x 2 ) — 21X|^2, 0 < x } < x 2 < I, zero elsewhere, be the joint 
p.dX of X { and 

(a) Find the conditional mean and variance of X h given X 2 = x 2 , 

0 < x 2 < 1 . 

(b) Find the distribution of 7= E(X^\X 2 )^ 

(c> Determine E( Y) and var ( Y) and compare these to EiX^ and var 
respectively. 

2 * 14 * If X { and X 2 are random variables of the discrete type having 
p.d.f. Ax u x 2 )^ (X. + 2x 2 )/I8, (jc ( , x 2 ) = (1, 1), (1, 2), (2, !), (2, 2), zero 
elsewhere, determine the conditional mean and variance of X 2 ^ given 
X x = x { , for x, = I or 2, Also compute E(3X { — 2X 2 \ 

j ‘1 

2*15. Five cards are drawn at random and without replacement from a bridge 
deck. Let the random variables X { , X 2 , and denote, respectively, the 
number of spades, the number of hearts，and the number of diamonds that 
appear among the five cards, 

(a) Determine the joint p.dX of X 2 , and X y . 

(b) Find the marginal probability density functions of X 2 , and 

(c) What is the joint conditional p.d.f. of X 2 and given that X x = 3? 
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2.16 - Let X x and X 2 have the joint pAI,fix u x 2 ) described as follows: 


(X B X 2 ) 

(0,0) (0 ， 1) (1,0) (U 1) 《 2,0) CU) 

A x i »^2) 

j 3 4 1 6 Z 

IS W TI Ts 18 Is 


mdj{x u x 2 ) is equal to zero elsewhere. Find the two marginal probability 
density functions and the two conditional means. 

Hint; Write the probabilities in a rectangular array* 

.■ 

2.17, Let us choose at random a point from the interval (0, I) and let the 
random variable X x be equal to the number which corresponds to that point. 
Then choose a point at random from the interval (0, where x } is the 
experimental value of and let the random variable X 2 be equal to the 
number which corresponds to this point, 

(a) Make assumptions about the marginal p.dX / ( 々 ),and the conditional 

p.dS. fjuixjlxi). 

(b) Compute Pr (X } + X 2 > I). 

(c) Find the conditional mean E(X { \x 2 y 

p 

2.18_ Let^(x) and denote, respectively, the p.dX and the distribution 
function of the random variable X. The conditional p.d.f. of X, given 
X > jc 0 , x 0 a fixed number, is defined by J{x\X > x Q ) 叉 )/ 卩一八又 0 )]， 

x Q < x, zero elsewhere. This kind of conditional p.d.f. finds application in 
a problem of time until death，given survival until time 

(a) Show that ^x|X > x 0 ) is a p.d.f. 

(b) Let J{x) — e^ x y 0 <x < oo, and zero elsewhere. Compute 
Pr(Jr>2|X> 1). 

2,19, Let X and Y have the joint p.d.f J{x ， y) = 6(1 — x — y\0 < x, 0 < y, 
x + y < 1, and zero elsewhere. Compute Pr (2X + 3Y < l) and 
£(XY+2X 2 ), 

2,3 The Correlation Coefficient 

Because the result that we obtain in this section is more familiar in 
terms of X and Y, we use X and Y rather than X x and X 2 as symbols 
for our two random variables. Let X and Y have joint p.dX J{x^ y). If 
u(x, y) is a function of jc and then Y)] was defined，subject to 

its existence, in Section 22. The existence of all mathematical 
expectations will be assumed in this discussion. The means of X and 
F, say /i, and 只 2 ，are obtained by taking u(x,y) to be x and y, 
respectively; and the variances of X and Y, say <7, and 4， are 
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obtained by setting the function w(jc, >0 equal to (x - fi { ) 2 and (y - ^ 2 )\ 
respectively. Consider the mathematical expectation 

E\{X fi { )( Y ^ 2 )] ~ E(XY — fi 2 X — ji\Y piipti) 

=E{XY) ^ ^EiX) -^i { E{ Y) + m i 2 

- E{XY) 

This number is called the covariance of X and Y and is often denoted 
by cov (X, Y). If each of a, and a 2 is positive, the number 

— E[{X - fii)(Y- fi 2 )] cov (X, Y) 

p = --- - — = - -- 

a x tj 2 

is called the correlation coefficient of ^Tand Y. If the standard deviations 
are positive，the correlation coefficient of any two random variables is 
defined to be the covariance of the two random variables divided by 
the product of the standard deviations of the two random variables. 
It should be noted that the expected value of the product of two random 
variables is equal to the product of their expectations plus their 
covariance; that is, E{XY) = + p^\o 2 = /i|/i 3 + cov (X, K), 

Example /. Let the random variables X and Y have the joint p.d.f. 

f(x, y)^x + y, 0<x<l t 0 <y < ], 

= 0 elsewhere. 


We shall compute the correlation coefficient of X and K When only two 

variables are under consideration, we shall denote the correlation coefficient 
by />• Now 


Mi ~ E{X) 


产 j 




x(x + y) dx dy — 


and 


°\ ^ E{^ 2 ) - 


/*[ 


0 ^0 


x 2 (x + y) dx dy 



2 


n 


144 


Similarly, 


^i = E{Y) 


\2 


and 


o\ = E( Y 2 ) ™ ix\ 


ll 

144 


The covariance of X and K is 

" 产 1 

E(XY)-h^ 2 


+ y) dx dy 
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Accordingly, the correlation coefficient of X and Y is 

_ —士 一 1 

V (mKm) ” : 


Remark. For certain kinds of distributions of two random variables, say 
^ and Y, the correlation coefficient p proves to be a very useful characteristic 
of the distribution. Unfortunately, the formal definition of p does not reveal 
this fact. At this time we make some observations about p, some of which will 
be explored more fully at a later stage. It will soon be seen that if a joint 
distribution of two variables has a correlation coefficient (that is, if both of 
the variances are positive)，then p satisfies -~l ^ p < l.lfp = 1, there is a line 
with equation y = a + bx, b>0 t the graph of which contains all of the 
probability of the distribution of X and Y. In this extreme case, we have 
Pt(Y — a + bX) - l/lfp = — 1 ， we have the same state of affairs except that 
办 < 0* This suggests the following interesting question: When p does not have 
one of its extreme values, is there a line in the jcj;-plane such that the 
probability for X and Y tends to be concentrated in a band about this line? 
Under certain restrictive conditions this is in fact the case，and under those 
conditions we can look upon p as a measure of the intensity of the 
concentration of the probability for X and Y about that line. 


Next，let f(x^ y) denote the joint p.d.f. of two random variables X 
and F and let f x (x) denote the marginal p,d,f, of X. The conditional 
p.dX of F, given X = x, is 


finiylx) 


f(^ y) 

fM) 


at points where f t (x) > 0. Then the conditional mean of V, given 
X = x, is given by 


E(Y\x ) - 




yJiw iyl^) dy 


yf(^ y) dy 


00 


/l ⑻ 


when dealing with random variables of the continuous type. This 
conditional mean of y, given X = x, is, of course, a function of x alone ， 
say u(x). In like vein, the conditional mean of X, given Y = y, is a 
function of y alone, say p(y). 

In case u{x) is a linear function of x, say u{x) ^ a + bx, we say the 
conditional mean of Y is linear in x; or that Y has a linear conditional 
mean. When u{x) = a + hx 7 the constants a and b have simple values 
which will now be determined. 
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It will be assumed that neither <r\ nor a\, the variances of X and Y, 
is zero. From 




E(m 


yA^ y) dy 


QO 


m 


a + bx. 


we have 


^00 


yAx, y) dy ^ (a + bx)f { {x). 


0) 


O0 


If both members of Equation (1) are Integrated on x, it is seen that 


E{Y) =a + bE{X), 


= a + bfi“ ( 2 ) 

where fi { = E(X) and fi 2 — E{Y). If both members of Equation (1) are 
first multiplied by jc and then integrated on x, we have 

E(XY) = aE{X) + bE(X 2 l 

:晷 

or 


+ = m + b{(j] 4 - (3) 

where pa\ <j 2 is the covariance of X and Y, The simultaneous solution 
of Equations (2) and (3) yields 

U 2 A U ai 

a ^ fi 2 — p — pi\ ana b ^ p —— * 

oy tTi 

That is ， 


u(x) — E(Y\x) = fi 2 


p~{x- ii { ) 

o I 


is the conditional mean of F, given X = x f when the conditional mean 
of Yis linear in jc. If the conditional mean of X, given Y ~ y, is linear 
in j，then that conditional mean is given by 

£Ti , 

v(y) ^ E{X\y) = + p — iy- ^ 

^2 

We shall next investigate the variance of a conditional distribution 
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under the assumption that the conditional mean is linear. The 
conditional variance of Y is given by 


var (Y\x ) - 


产 <50 



y-ti 2 - p~(x 


内) fzwiylx) dy 


^00 




2 


y) dy 




⑷ 


when the random variables are of the continuous type. This variance 
is nonnegative and is at most a function of x alone- If then, it is 
multiplied by f t (x) and integrated on the result obtained will be 
nonnegative. This result is 



f{x y y) dy dx 



+ p 2 

A 


a 2 2 1 

— {x~ ^j) J{x, y) dy dx 


+ p 2 ^E[(X~ fil f] 

=^2 - 2 P 2 ^2 + P 1( A = ^ 2 ( 1 ^ p 2 ) s ： 0 . 

That is, if the variance, Equation (4), is denoted by 众 (jc) ，then 
£[k(X)] = al(l - p 2 ) > 0. Accordingly, p 2 ^ I, or — 1 < p < 1, It is 
left as an exercise to prove that —1 < p < I whether the conditional 
mean is or is not linear. 

Suppose that the variance, Equation (4)，is positive but not a 
function of x; that is, the variance is a constant k > 0, Now if k is 
multiplied by f x (x) and integrated on the result is A:, so that 
k — a\{l — p 2 ). Thus, in this case，the variance of each conditional 
distribution of 7, given X = x 9 is a 2 2 (\ - p 2 ). lfp = 0, the variance of 
each conditional distribution of Y, given X = x, is the variance of 


4 一 P a ^° 2 + ^ 
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the marginal distribution of Y. On the other hand, if p 2 is near one, 
the variance of each conditional distribution of Y, given X ^ x, is 
relatively small, and there is a high concentration of the probability 
for this conditional distribution near the mean E{ Y\x) = fi z + 

- Mi). 

It should be pointed out that if the random variables X and Y in 
the preceding discussion are taken to be of the discrete type, the results 
just obtained are valid. 

Example 2. Let the random variables X and Y have the linear con¬ 
ditional means E( Y\x) ^ 4x + 3 and E(X\y) = j^y — 3. In accordance with the 
general formulas for the linear conditional means, we see that = fi 2 if 

^ — Mi and E{X\y) ^ fi } if y — Accordingly, in this special case, we have 
只 2 = + 3 and = 击 /i 2 — 3 so that fii = —^ and 只 2 = — 12 - The general 

formulas for the linear conditional means also show that the product of the 
coefficients of 义 srndy, respectively , is equal to p 2 and that the quotient of these 
coefficients is equal to aj/crj. Here p 2 — 4(^) = | with p — { (not 一 !)，and 
— 64, Thus, from the two linear conditional means, we are able to find 
the values of pi u pt 2 ^ and a 2 ja u but not the values of o x and a 2 - 

Example 3. To illustrate how the correlation coefficient measures the 
intensity of the concentration of the probability for X and Y about a line, let 
these random variables have a distribution that is uniform over the area 
depicted in Figure 2.1, That is, the joint p.dX of X and Y is 

f(x, y) = , —a + bx<y<a + bx ， —k<x<h ， 

— 0 elsewhere. 



FIGURE 2.1 
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We assume here that 方 > 0, but the argument can be modified for A n 
is easy to show that the p,d_f_ of Jf is uniform，namely 5 4t 


Mx) 


(*a + bx 


—a + bx 


4^h dy ^2h 


h < x < h. 


0 elsewhere. 


Thus the conditional p,d,f, of F, given X — x y is uniform: 


fiwiyM 




l/4ah 


\]2h -a + bx<y<a + b x ^ 


0 elsewhere* 


The conditional mean and variance are 


E(Y\x) - bx and var (Y\x) 


a 2 


From the general expressions for those characteristics we know that 


b — p 


^2 

A 


and 


a 1 


ct^(1 — p 2 ). 


In addition, we know that c\ — A 2 /3. If we solve these three eq Uat ； 0 
obtain an expression for the correlation coefficient, namely 如， 


we 


P 


Referring to Figure 2A, we note: 


bh 


y/a 2 + b 2 h 2 


1. Asa gets small (large), the straight line effect is more (less) intense and p 

2. fs h gets large (small), the straight line effect is more (less) intense and p 

3. As i gets large (small)，the straight line effect is more (less) 」 

is closer to 1 (zero). ntense and P 

This section will conclude with a definition and an iUustra〆 
example. Let f(x, y) denote the joint p.d.f. of the two ranr^m 、咐 
ables JTand r. exists for ~h l<h <h u 

where h x and h 2 are positive, it is denoted by M{t x , t 2 ) and i s caI1 2 ed 
moment-generating function (m,g.f) of the joint distribution 0 f ^ d 
Y. As in the case of one random variable, the M{t x y t 2 ) Co i ^ 
determines the joint distribution of X and r, and hence the m f r a ^ 
distributions of X and K In fact, the M x {t { ) of X i s gmal 
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and the m_gX M 2 (/ 2 ) of Y is 

M 2 (t 2 ) - E(e^ Y ) - _" 2 ), 

* - 

In addition, in the case of random variables of the continuous type. 

o 

x k y m e t]X + tiy J{x f y) dx dy. 


d k ^M{t u t 2 ) 

*00 




so that 

d k ^M{t u t 2 ) 
dt^ dq 


x k y m A^ y) dx dy = E{X k Y m y 


at I2 ^ 0 *^—00 *^—00 

For instance, in a simplified notation which appears to be clear, 


Me E{X) 


dM(0, 0) 

dtv 


H2^E(V) 


dM(0, 0) 


dt 2 


」 — 2 綱 0,0) 2 

of = E(X 2 ) - ii] ^ ^ - fi% 

2 — 2 類 0,0) 2 

E ( Y ) h = — ^2 -只2， 

、、 d 2 M(0, 0 ) 

E[(X ^ ^)(¥ ^ fi 2 )] = &乂 


(5) 


and from these we can compute the correlation coefficient p. 

It is fairly obvious that the results of Equations (5) hold if X and 
Y are random variables of the discrete type. Thus the correlation 
coefficients may be computed by using the m.gX of the joint 
distribution if that function is readily available* An illustrative example 
follows. In this, we let e w = exp (w). 

Example 4 Let the continuous-type random variables X and Y have the 
joint p.d.f. 

/(jc, y) — e" y 9 0 < x < y < co, 

— 0 elsewhere. 


The m.g.f. of this joint distribution is 

/*00 


M{tu t 2 ) 


Jo 


exp (/ r x + i 2 y - y) dy dx 


(1 


— — h) 
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provided that + t 2 < I and / 2 < L For this distribution. Equations (5) 
become 

p 

A = 1， = 2， 

_ a] = I, 4 = 2 ， ⑹ 

E[{X — — /i 2 )] = I- 

Verification of results of Equations (6) is left as an exercise* If, momen. 
tarily^ we accept these results, the correlation coefficient of X and Y is 

P ^ Furthermore, the moment-generating functions of the marginal 
distributions of X and Y are, respectively, 

A < 1, 

h < 1. 

These moment-generating functions are, of course, respectively, those of 
the marginal probability density functions, 

f*00 

/i( x ) — dy = e~ x , 0 < x < oo, 

zero elsewhere, and 

ry 

fiiy) = dx = ye~ y ^ 0 < j < oo, 

Jo 

zero eisewhere* 

EXERCISES 

•? «. 

p ■ 1 

2*20, Let the random variables X and Y have the joint p.d,f* 

⑻ >0 = ! ， （ x，/) = (0,0) 5 (1 ， 1) ，（ 2, 2), zero elsewhere. 

(b) J{x, y) — | s (x, y) — (0, 2) f (l y 1), (2, 0), zero elsewhere. 

( c ) y) — ^ y) — (0^ OX (1, IX (2, 0), zero elsewhere. 

In each case compute the correlation coefficient of X and Y 

* 

2*21. Let X and K have the joint p.dX described as follows; 


O, 少） 

(1,1) 

(1,2) (1 ， 3) 

(2J) 

(2,2) (2,3) 


1 

15 

4 3 

Ts T5 

1 

15 

I 4 

15 15 


and j;) is equal to zero elsewhere •⑻ Find the means / 4 j and // 2j the 
variances a] and a\ y and the correlation coefficient p m (b) Compute 

= 1 )， E(Y\X = 2), and the line fi 2 + Do the points 

[ 免， E(Y\X= k% k = U % lie on this line? 


m,o) 




M(0. t 2 ) 


(1 4) 2 ， 
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2.22. Let/(x ， y) — 2,0 < x < y^O < y < I, zero elsewhere, be the joint p.d.f. 
of X and Y. Show that the conditional means are ， respectively, (l + x)/2, 
0 < x < 1 ， and y/2, 0 <y < 1. Show that the correlation coefficient of X 
and F is p = j. 

2.23. Show that the variance of the conditional distribution of K, given X = x, 
in Exercise 2,22, is (1 — jc) 2 /12, 0 < x < 1, and that the variance of the 
conditional distribution of X, given Y = y^is y^/\2 f 0 <y < L 

2*24. Verify the results of Equations (6) of this section. 

2.25. Let X and Y have the joint p.dX ^(jc ， j) = 1 ， 一 x <y <x y 0.< x < 1, 
zero elsewhere. Show that* on the set of positive probability density，the 
graph of E( F|x) is a straight line, whereas that of E(X\y) is not a straight 
line. 


2*26. If the correlation coefficient pofX and Y exists, show that _ 1 < p < l. 
Hint: Consider the discriminant of the nonnegative quadratic func- 
tion h(v) — E{[(X — //,) + v(Y — 此 )] 3 }， where v is real and is not a function 
of X nor of Y. 

2,27* Let \{/(t u t 2 ) = In M{t x , f 2 X where M(t u t 2 ) is the m.gS. of X and K 
Show that 

* « 

t d 轉， Q) d 2 m 0) .,, 

::-——了， - IT ， l = I ， 2 ’ 

v ^ 

and 

〜 gmo> 

dti dt 2 

yield the means, the variances, and the covariance of the two random 
variables. Use this result to find the means, the variances, and the covariance 
oT X and Y of Example 4. 


2.4 Independent Random Variables 

a n .. —. I 

Let X| and X 2 denote random variables of either the continuous or 
the discrete type which have the joint p.dX J{x u x 2 ) and marginal 
probability density functions f\(xi) and f 2 (x 2 X respectively. In 
accordance with the definition of the conditional p.d’f. /antelx,), we 
may write the joint p,dS.Jlx u x 2 ) as 

f{x x , x 2 ) = f 2 \ I (x 2 ki )f\ (x { y 

Suppose that we have an instance where / 2 ji(x 2 |X|) does not depend 





Sec， 2.咐 Independent Random Variables 


101 


upon & • Then the marginal p.df, of P 

continuous type ， 2 ? tor random variables of the 


/2O2) 


八办也 V： ⑷办 




dxi 


fni(x 2 \x l ) t 


Accordingly, 


fi{ x i) ^ f 2 \\{x 2 \x\) and /iv 、 ^ 

, r / f J{Xu ^ 

when^jj ^txj) does not depend upon y , 

distribution of X 2 , given X x = jc, 5 is inde a 1S? ^ conc ^tioiia! 

about 弋， ^f l (x l )f 2 (x 2 l any assum P tion 

the following definition. se c °nsiderations motivate 

Definition 2* Let the random variables v 
p.dX J(xi f x 2 ) and the marginal probabirt a En< ? h av e the joint 
andf 2 (x 2 X respectively. The random variahi^ 苄 nsity functions /； ⑷ 
independent if, and only and 尤 2 are said to be 

that are not independent are said to bt d Random variables 

Remarks* Two comments should be made ah 
Firs' the product of two positive functions /v° Ut the P recedi ng definition, 
that is positive on a product space. That is jf 的 ( 文 2 ) means a function 

on, and only on，the respective spaces a A an ^ f 2 {x 2 ) are positive 

and fiix^ is positive on，and IqiJ 1 4， then the product of 

^ ^ i( x i ^ X 2 )^ x e^ u x 2 e s/ 2 }. For instance T ? n, the Product, space 

^2-{^2 ： 0<^<3} s then ^^{(x u x 2 )：a l "^ :0 <x { < l}and 

second remark pertains to the identity. The idem : ，〈 0 〈 A < 3}, The 

interpreted as follows. There may be certain * ^ ln ^ e ^ n ^ion 2 should be 

A^i > x 2 ) ^ fi(xi)f 2 (x 2 ). However, if A is the set ^l 01n ^ s (^ 1 ^ 2 ) e at which 

equality does not hold, then P(A) = 0. In the k ° mts ( x _ ， A) at which the 

subsequent generalizations, a product of no theorems and the 

identity should be interpreted in an 辟行 Ve functions and an 

6 us banner. 

Example L Let the joint p.df. of X, and X 2 b e 


J^Xy , ^ 2 ) — + ^ 2 , 


0<X ^<K o< Xl<l 

0 , ； elsewhere. 
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It will be shown that smdX 2 are dependent. Here the marginal probability 


0 < jc ! < 1， 


0 < x 2 < I, 


Since flx u x 2 ) ^ (x { )/ 2 (x 2 ), the random variables and X 2 are dependent 

■i 

The following theorem makes it possible to assert, without 
computing the marginal probability density functions, that the random 
variables X x and A" 2 of Example 1 are dependent. 

Theorem i* Let the random variables X x and X 2 have the joint pA.j\ 
x 2 )- Then X x and X 2 are independent if and only if f(x l7 x 2 ) can be 
written as a product of a nonnegative function of alone and a 
nonnegative function of x 2 alone. That is 、 

，文 2) 妄咖帅 2 )， 

■ 

where g(X\) > 0 5 x l zero e!sewhere y and h(x 2 ) > 0 y x 2 e s^ 2y zero 
elsewhere. 


density tunctions are 


fM\) 


f{x x , x 2 ) dx 2 


(X[ + x 2 ) dx 2 — 




0 elsewhere. 


and 


fi(x 2 ) 




f(x t y x 2 ) dx y 


产 I 


(x, + x 2 ) dx x = [ + x 2 , 


^0 


0 elsewhere. 


Proof. If X\ and X 2 are independent^ then f{x Xy x 2 ) = /i(A 仏 ( 文 2 )， 
whereand f 2 (x 2 ) are the marginal probability density functions 
of X x and X 2 , respectively. Thus the condition f(x { , x 2 ) = ^(xi)/t(jc 2 ) 
is fulfilled* 

Conversely, if f(x { , x 2 ) = g(x } )h(x 2 X then, for random variables of 

the continuous type, we have 

. — ■ 4 ■ * 卜 

f*QQ 广 00 

^ )h(x 2 ) dx 2 ^ g(x } ) h(x 2 ) dx 2 ^ €^ { ) 




g(x ] Mx 2 )dx l = h{x 2 ) 


g(x l )dx { = c 2 h(x 2 ). 


and 
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where and c 2 are constants, not functions of x } or x 2 . Moreover, 
c t c 2 — 1 because 

h(x 2 ) dx 2 

= 响 * 

These results imply that 





g(x l )h(x 2 ) dx x dx 2 




g{^\) dx^ 


f{x u x 2 ) = g{x x )h{x 2 ) - c l g(x [ )c 2 h(x 2 ) 
Accordingly, and X 2 are independent. 


If we now refer to Example 1， we see that the joint p.d.f. 

f(x { , X 2 ) = X| + 0 < x x <1 ， 0 <x 2 < U 

— 0 elsewhere, 

* 

cannot be written as the product of a nonnegative function of JC] alone 
and a nomiegative function of x 2 alone. Accordingly, X y and X 2 are 
dependent. 

Example 2. Let the p-dX of the random variables X x and X 2 be 
^ 2 ) = 8x,x 2if 0 < x, < x 2 < 1， zero elsewhere. The formula 8X|X 2 might 
suggest to some that X x and X 2 are independent. However, if we consider the 
space si - {(x M x 2 ) : 0 < < x 2 < 1), we see that it is not a product space* 

This should make it clear that，in general, X { and JT 2 must be dependent if the 
space of positive probability density of X { and X 2 is bounded by a curve that 
is neither a horizontal nor a vertical line* 

b 

We now give a theorem that frequently simplifies the calculations 
of probabilities of events which involve independent variables. 

Theorem 2, If X\ and X 2 are independent random variables with 
marginal probability density functionsandf 2 {x 2 ), respectively, then 

Pr {a < X { < b, c < X 2 < d) — Pr (a < < b) Pr (c < X 2 < d) 

for every a <b and c < d，where a, b ， c，and d are constants. 

Proof. From the independence of X x and X 2> the joint p.d.f- of X x 
and X 2 is f l (xi)f 2 (x 2 ). Accordingly, in the continuous case ， 
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Pr (a < X x < b, c < X 2 < d) 


b 产 d 


dx 2 dx x 


c 


f*b 


f\{x x )dx ] 




f 2 (x 2 ) dx 2 


Pr (a <X { < b) Pr (c<X 2 < d); 


or，in the discrete case, 


Pr(a<X [ <b,c<X 2 <d)= [ £ fiMf 2 (x 2 ) 

' a < xi <b c < 乂 2 < d 


Z /!(:_) 


X2< d 




Pr (a < X x < b) Pr (c < X 2 < d )， 


as was to be shown. 


.Example 3. In Example I, X x and X 2 were found to be dependent. There, 
in general ， 

Pr (a < A <b,c<X 2 <d)^ Pr (a <X { < b) Pr (c<X 2 < d). 

For instance, 

/ * 1/2 / * 1/2 


Pr(o<jr l <|,o<jr 2 <|) = 




Ui + X 2 ) dx } dx 2 = I 


o 


whereas 


Pr(0<X l <|) 


i/2 


(^i + 1) dx 


and 


Pr(0<Jr 2 <i) = 


1/2 


(l + ^i) dx 2 


o 


Not merely are calculations of some probabilities usually simpler 
when we have independent random variables，but many expectations, 
including certain moment-generating functions, have comparably 
simpler computations. The following result will prove so useful that we 

state it in the form of a theorem* 

■ 

* . . - * 、 B 

Theorem 3, Let the independent random variables X x andX 2 have the 
marginal probability density functions f\(xy) and f 2 (x 2 ) ， respectively. 
The expected value of the product of a function u(X } ) of alone and 
a function v(X 2 ) of X 2 alone is^ subject to their existence，equal to 
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the product of the expected value ofu(X x ) and the expected value ofv(X 2 )； 
that is^ 


- E[u(X { )]E[v(X 2 )l 

* . 

Proof. The independence of X x and X 2 implies that the joint p.d.f. 
of Z, and X 2 is fi{xi)f 2 {x 2 y Thus we have，by definition of expectation, 
in the continuous case, 

/*oo / *co 

E[u{X x )v(X 2 )]=^ u(x } )v(x 2 )f l (x l )f 2 (x 2 ) dx x dx 2 


<x t )f x (x,) dx\ 


E[um]E[v(X 2 )]; 


V{X 2 )f 2 (X2) dx 2 


or, in the discrete case 


E[u(X0v(X 2 )} - E ⑹ 

n jfi 


E u i x v¥\{ x \) 




Z V(JCl)f2(X2) 


-取專 [柳 ]， 

as stated in the theorem. 


Example 4 Let X and Y be two independent random variables with 
means and // 2 and positive variances erf and g ， respectively. We shall show 
that the independence of X and Y implies that the correlation coefficient of 
X and Fis zero. This is true because the covariance of X and Kis equal to 

E[(X^ _ /i 2 )] = — 川 )- 灼 ） -a 

We shall now prove a very useful theorem about independent 
random variables. The proof of the theorem relies heavily upon our 
assertion that an m-g f” when it exists, is unique and that it uniquely 
determines the distribution of probability* 

Theorem^ LetX t andX 2 denote random variables that have the joint 
p-dj\ J[x {3 x 2 ) and the marginal probability density functions f\{x x ) and 
f 2 (x 2 \ respectively. Furthermore，let M{t u t%) denote the of the 

distribution. Then X、 and X 2 are independent if and only if 

¥ r ^ ■ - r ■ j 

A/(/i ， 6) = 0)Af(0, ^ 2 )- ^ 
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Proof. If X) and X 2 are independent, then 

聊 ," 2 ) = £(〆 ■々 + ⑽) 

=£(〆 ■ 夂 丨 〆 2 ’ 2 ) 

= E(e UX} )E(e t2X2 ) 

= M (/ 卜 0)雕从 

Thus the independence of X { and X 2 implies that the m,g.f+ of the joint 
distribution factors into the product of the moment-generating 
functions of the two marginal distributions. 

Suppose next that the m.g.f. of the joint distribution of X } and X 2 
is given by M(t u t 2 ) = M(t t , 0)M(0, t 2 ). Now X x has the unique m.gX 
which, in the continuous case, is given by 

f'OO 

M { t x ， 0)= 叫 dx x . 


Similarly, the unique m.g 工 of X 2 ^ in the continuous case, is given by 


卵，6) 




—JIM dx 2 . 


Thus we have 


M(t u 0)M(0,t 2 ) 




e tm Mx x )dx ] 


e f2X2 f2(x 2 ) dx ： 




e hxi + ⑹私办 2 , 


We are given that M(t l% t 2 ) = M(t u 0)M(0, t 2 )； so 

/•go / *00 

从 Oi, 6) = I e tixi + t2X2 fi )f 2 (x 2 ) dx } dx 2 * 



But M(t u t 2 ) is the m.g.f. of X x and X 2 . Thus also 


M(t u t 2 ) — + t2K2 A x \ ^ ^ 2 ) dx 2 - 

— oo — 00 

The uniqueness of the m.gX implies that the two distributions of 
probability that are described by f ] (x ] )f 7 (x 1 ) and Jix x ,x 2 ) are the 
same. Thus 


A^uXj) =f\{x A )f 2 {x 2 ). 

That is, if M(t u r 2 ) ^ 0)M(0, f 2 )，then X x and X 2 are indepen¬ 

dent. This completes the proof when the rand pm variables are of the 
















Sec. 2.5J Extension to Several Random Variables 


107 


continuous type. With random variables of the discrete type，the 
proof is made by using summation instead of integration. 

EXERCISES 

2.28* Show that the random variables X x and X 2 with joint p.dX f(x Xj x 2 ) = 
12^|jc 2 (l — x 2 \ 0 < x t < 0 < x 2 < zero elsewhere, are independent. 

2,29. If the random variables X x and X 2 have the joint p,d,f. /(jc 15 x 2 ) = 
2e^ xi ~ x \ 0 < Xi < x 2 , 0 < x 2 < co f zero elsewhere, show that X x and X 2 
are dependent 

2.30* Let /(xj T x 2 ) — x l — 1, 2, 3,4, and x 2 — U 2, 3, 4, zero elsewhere, 
be the joint p*d,f. of X x and X 2 * Show that X x and X 2 are independent. 

231. Find Pr (0 < ^ < |, 0 < < |) if the random variables X\ and X 2 have 

the joint p dJf. f(x u x 2 ) ^ 4x,(I - x 2 \ 0 < ^ < I, 0 < x 2 < I, zero 
elsewhere. 

2,32* Find the probability of the union of the events a < X { < 

— oo < Jf 2 < oo and — ao < Jfj < oo, c < X 2 < d if X } and X 2 are two 
independent variables with Pr(a<X } <b) = j and Pr (c < X 2 <d)^^ 

233. If f(x } , x 2 ) — e~ x ^ ~ x \ 0 < x { < <x> 9 0 < x 2 < oo 3 zero elsewhere, is the 
joint p.d.of the random variables X x and X 2 , show that X x and X 2 are 
independent and that - - t 2 y\ t 2 <!,/；< L Also 

show that 

E{e t{]Cl + ^ 2> ) = (1 — ty\ / < L 

Accordingly, find the mean and the variance of K = ^ + X 2 ^ 

2*34* Let the random variables Z, and X 2 have the joint p,d,f. f(x { , x 2 ) = l/n^ 
(X| — l) 2 + (x 2 + 2) 2 < 1， zero elsewhere. Find/iCO andAh). Are Xj and 
X 2 independent? 

■* * 

2.35, Let X and Y have the joint p.d.f. f(x, y) — 3jc, 0 < y < x < 1, zero 
elsewhere. Are X and Y independent? If not, find E(X\y\ 

2.36. Suppose that a man leaves for work between 8:00 a.m. and 8:30 a.m, 
and takes between 40 and 50 minutes to get to the ojffice. Let X denote the 
time of departure and let Y denote the time oi travel* If we assume that these 
random variables are independent and uniformiy distributed, find the 
probability that he arrives at the office before 9:00 a.m. 

2 # 5 Extension to Several Random Variables 

The notions about two random variables can be extended 
immediately to n random variables. We make the following definition 
of the space of n random variables. 
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Definition 3. Consider a random experiment with the sample 
space Let the random variable assign to each element 
one and only one real number X^c) - x h i = 1,2,. .. 
The space of these random variables is the set of ordered ^-tuples 
= {(x“ x 2 , • ‘. ， 4 … ， x n = X n (c), c€^}. Further¬ 

more, let 』 be a subset of j/. Then Pr [(Xi ，… • ， JQ e A] = P(C )，where 
C — {c : ce^ and [Xi(c), X 2 (c\ •. ” X n (c)\ e A). 

Again we should make the comment that Pr [(X u … ， X n ) s A] 
could be denoted by the probability set function P x ^ But，if 

there is no chance of misunderstanding, it will be written simply as 
P(A). We say that the n random variables X t7 are of the 

discrete type or of the continuous type, and have a distribution of that 
type, according as the probability set function P{A\ A cz can be 
expressed as 


… ，尤) …， 〜)， 


or as 

P(A) = Pr [(^,. X a )eA]^\-y 

In accordance with the convention of extending the definition of a 
p,df” it is seen that a point function f essentially satisfies the conditions 
of being a p,dX if (a)/is defined and is nonnegative for all real values 
of its argument(s) and if (b) its integral [for the continuous type of 
random variable(s)], or its sum [for the discrete type of random 
variable(s)] over all real values of its argument(s) is I. 

The distribution function of then random variables X u X 2 , t X n 
is the point function 

厂 (4 ，义 2, • " ， A) = Pr (I_ S ， ^2 S 太 2 , _ ♦ • 

An illustrative example follows. 




Example L Let/(x, z) = +’ + ’>, 0 < x s z < oo s zero elsewhere, be 

the p.d X of the random variables Jf, Y y and Z. Then the distribution function 
of Y\ and X is given by 

F(x, y, z) = Pr (X < x fJ Y<y,Z<z) 


f*y 

dudvdw 





=(1 一 e^ x )(l - r 少 )(1 - r 0 ， 0<x,y,z<oo, 
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， x 2 ,. • ^x n )f(x u x 2y ,. 

X\ 


exists if the random variables are of the discrete type* The w-fold 
integral (or the n4old sum，as the case may be) is called the expectation, 
denoted by E[u(X u … ， X n )] y of the function u{X \, X 2i … ， X n ). In 
Section 4.7 we show this expectation to be equal to E(Y), where 

Y = u(X { , X 2 y - ， X n ). Of course, £ is a linear operator. 

We shall now discuss the notions of marginal and conditional 
probability density functions from the point of view of n random 
variables. All of the preceding definitions can be directly generalized 

to the case of n variables in the following manner. Let the random 

■ ■ 

variables X {y X 2y … ， X n have the joint pAS.f(x u x 2y …，，夏 f the 
random variables are of the continuous type, then by an argument 
similar to the two-variable case，we have for every a < 

■ m . k « 

Pr (a < Xi < b) = /j(xj) dx u 
where /■(%) is defined by the (n — l)-fold integral , 

，oo 广 oo 

fxM = I … I f(X\ ， x 2 , …， xj dx 2 - dx n - 



Therefore,/(xt) is the p.dX of the one random variable X { and fi(x t ) 
is called the marginal p.dX of The marginal probability density 
functions . ，/! OO of X ly … ， 尤， respectively，are similar 

(n — l)_fold integrals. 

Up to this point, each marginal p,d ， f. has been a p,df, of one 
random variable. It is convenient to extend this terminology to joint 


and is equal to zero elsewhere. Incidentally, except for a set of probability 
measure zero, we have * 

d 3 F{x^ y, z) 

Let X u X 2 , X n be random variables having joint p.d.f. 
f(x u x 2 , x n ) and let u(X i ,X 2 ,. • * ， X n ) be a function of these 
variables such, that the 打 -fold integral 


u(x u x 2 . 


x n )f{x Xj x 2 


x n ) dxi dx 2 … dx n (1) 


exists, if the random variables are of the continuous type，or such that 
the w-fold sum 


s 




* 
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probability density functions, which we shall do now. Here let 
f( x u .. •，又 J be the joint p.df. of the random variables 
X,, X 2 ,- … ，Adjust as before. Now, however, let us take any group of 
k < n of these random variables and let us find the joint p.d.f. of 
them. This joint p.dX is called the marginal p_d.f, of this particular 
group of k variables. To fix the ideas，take n ^ 6, k = 3, and let us select 
the group X 2y X 4y X 5 . Then the marginal p.d.f, of X 2 , X A , X 5 is the joint 
p‘d.f. of this particular group of three variables，namely, 

/ *00 


/»bo 


— oo 


if the random variables are of the continuous type* 

Next we extend the definition of a conditional pAS. If> 0 t 
the symboI/ 2t . .…， is defined by the relation 

/(々， 义 2 , …， x „) 


fi — . nil 


推|) 


f\M 


and / 2 . n "(x 2 , … - ， is called the joint conditional p.dj. of 

…，尤， given X x = x { . The joint conditional p.d.f. of any n — 1 
random variables, say 不， .■ ■ ， JT, — 尤 + !，•…， A；，given X t - is 
defined as the joint p.d.f. of X 2 , , X n divided by the marginal 

P d.f* /(X)， provided that fi(xi) > 0. More generally, the joint 
conditional p,dX of w — A: of the random variables, for given values of 
the remaining k variables, is defined as the joint p.d,f. of the n variables 
divided by the marginal p.d.f, of the particular group of k variables, 
provided that the latter p*dX is positive. We remark that there are 
many other conditional probability density functions; for instance, see 
Exercise 2,18, 

Because a conditional p-d.f, is a p.d.f. of a certain number of 
random variables, the expectation of a function of these random 
variables has been defined. To emphasize the fact that a conditional 
p_dX is under consideration, such expectations are called con¬ 
ditional expectations- For instance, the conditional expectation of 

w(X 2 , …， given = X,, is, for random variables of the continuous 

type, given by 


JE[u(X 2 


K)M 


^OD 



u(x 2 . x n ) 


x /2 … ，甽 . ^x ft \x i )dx 2 - - * dx n 
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provided /(xi) > 0 and the integral converges (absolutely). If the 
random variables are of the discrete type, conditional expectations are ， 
of course, computed by using sums instead of integrals. 

Let the ^random variables X u X 2l X n have the joint p.dX 
/( 太 | ，文 2 , ）and the marginal probability density functions 
/ 1 ( 又 _) ， / 办 2 ) ， … ， / iOO, respectively* The definition of the indepen¬ 
dence of X y and X 2 is generalized to the mutual independence 
of X u » X n as follows: The random variables X i 9 X 2 , …， 尤 

are said to be mutually independent if and only if 

f{xux 2 , … ， x„) =f\(x l ]f 2 (x 2 )- - 乂⑹ ， 

It follows immediately from this definition of the mutual independence 
of X U X 2 ,.. ~, X n that 

Pr (a】< X\ < b 、， g 2 < X 2 < bi，* " ， iin < X n < 

=Pr (n】< JTj < ) Pr (a】< < A〕）• • ■ Pr 

= Yl Pr {a ( <X f < bX 

/= I 

n 

where the symbol f| (p(f) is defined to be 

/= i • 

•* 

f[ <P(0 = 免⑴少 (2) … <P(n). 

i = t 

The theorem that 

E[uiXM^)} = EiuiX^E^)} 

for independent random variables X { and X 2 becomes, for mutually 
independent random variables H … ，尤， 

E[u i (X l )u 2 (X 2 ) - - - u n (X n )] = E [ u x { X x )] E [ u 2 { X 2 )] - - - E[u n {X n % 
or 

e n um = n 

}^j i — / =i 

The moment-generating function of the joint distribution of n 
random variables X x ^X 2 ^ ^ ,X n is defined as follows. Let 

E[^xp (t^X] + t 2 X 2 + • • • + t n ^n)] 

exist for — A, < h < h h i i ， 2, . … ， a，where each h t is positive. This 
expectation is denoted by M{t x , / 2 , — * ， 4) and it is called the m.g.f. 
of the joint distribution of (or simply the m.g.f* of 

， ... ， X n ). As in the cases of one and two variables, this m.g.f. 
is unique and uniquely determines the joint distribution of the n 
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variables (and hence all marginal distributions). For example, the 
m-g.f. of the marginal distribution of X/ is M(0,,, -, 0 S t H 0, … • ， 0 )， 
i = 1 ， 2” that of the marginal distribution of X £ and X ; is 
Af(0, •… ， 0, 〔， 0, • ■ • ， 0, / /s 0, . •. ， 0); and so on. Theorem 4 of this 
chapter can be generalized, and the factorization 

n 

你 (A ， 6， * …， O = n Af(0 ^. " ， 0, 0， . * ■ ， 0) 

/ ^ 1 

is a necessary and sufficient condition for the mutual independence of 

Remark. If A"j, X 2 , and are mutually independent, they are pairwise 
independent (that is ， 不 and X j3 i # j\ where ij = 1, 2, 3, are independent). 
However, the following example, due to S. Bernstein, shows that pairwise 
independence does not necessarily imply mutual independence. Let X 2 、 
and X 3 have the joint p.dX 

x 2? Xj) — ^ (4, x 3 ) e {(I ， 0 S 0) ，（ 0, 1 ， 0) ，（ 0, 0, 1) ，（ 1 ， 1 ， 1 )}， 

= 0 elsewhere. 

f 1 

The joint p.d.f, of and X p i ^ j f is 

Xj) = I, (x h Xj) e {(0, 0), (1，0)，(0, 1)，（i, 1)}, 

= 0 elsewhere, 
whereas the marginal pA.L of X t is 

= ^ 0, Ij 

= 0 elsewhere. 

Obviously, if i # j, we have 

and thus and Xj are independent* However, 

>( 々， 知 x 3 ) # /f(X,)/ 2 (X 2 )/ 3 (X 3 ), 

Thus Jf 2 , and X % are not mutually independent. 

Example 2. Let X u X 2 , and 為 be three mutually independent random 
variables and let each have the p,d.f.y(x) = 2x, 0 < x < 1 ， zero elsewhere. The 
joint p.d-f, of X 2 , is Jix { )J{x 2 )J[xy) = Sx t x 2 x^ 0 < x, < I， / = 1 ， 2, 3 ， 
zero elsewhere. Then, for illustration, the expected value of SX x X\ + 3X 2 X A 3 
is 

/*! 产 I 

(5x { xl + 3x 2 X3)8x j x 2 jc3 dx } dx 2 dx 3 — 2* 
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Let Y be the maximum of X u Jf 2 , and Then, for instance, we have 

Pr(r<i) = Pr(Jr 1 <i,Z 2 <|,^<|) 

、m /*i/2 户 i/2 

8^1 x 2 x } dx } dx 2 dx y 




(\y 
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In a similar manner, we find that the distribution function of Y is 


G{y) 


= 


Pr(r<j) = 0, 


^ < 0 
0 <J^< u 


1 ， 


< 少 . 


Accordingly, the p.dX of Y is 

g(y) = 6 /, 


0 < _^ < 1, 


0 elsewhere. 


Remark, Unless there is a possible misunderstanding between mutual and 
pairwise independence, we usually drop the modifier mutual* Accordingly, 
using this practice in Example 2, we say that X { ^ X y are independent 
random variables, meaning that they are mutually independent. Occasionally, 
for emphasis, we use mutually independent so that the reader is reminded that 
this is different from pairwise independence. 


EXERCISES 

2.37. Let K, Z have joint p.d.f. J{x, y, z) — 2(x + y + z)/3, 0 < x < 1, 

0 < 少 < 1 ， 0 < z < 1 , zero elsewhere* 

(a) Find the marginal probability density functions* , 

(b) Compute Pr (0 < ^ 0 < K < |, 0 < Z < |) and Pr (0 < X < |)= 

Pr (0 < 7 < j) = Pr (0 < Z < i). 、 

(c) Are X, K, and Z independent? 

(d) Calculate E(X 2 YZ + ^XY 4 Z 2 ). 

(e) Determine the distribution function of Y f and Z. 

(f) Find the conditional distribution of Jlf and V, given Z = z, and evaluate 
E(X+ Y\z). 

(g) Determine the conditional distribution of X given Y — y and Z = z ， 
and compute E(X\y % z). 

2,38 - Let x 3 ) = exp [ — (x t + x 2 + x 3 )]， 0 < < oo, 0 < x 2 < oo, 

0 < < oo, zero elsewhere, be the joint p,dX of X 2 , X y 

(a) Compute Pr {X x < X 2 < Xj) and Pr (X x — X 2 < A^). 

(b) Determine the m.g.f. of X u X 2 ^ and Are these random variables 
independent? 
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239 m Let X x , X 2 ^ and X 4 be four independent random variables, each with 
p.d.f. f(x) = 3(1 — x) 2 , 0 < x < 1, zero elsewhere. If K is the minimum of 
these four variables，find the distribution function and the p,d,f. of Y. 

2*40. A fair die is cast at random three independent times. Let the random 
variable X t be equal to the number of spots that appear on the rth trial ， 
^ ~ I ， 2, 3. Let the random variable Y be equal to max(0. Find the 
distribution function and the of K 

Pr(Y<y) = Pr(X i <yJ^U 2, 3). 

2.41. Let M{t u t 2 , / 3 ) be the m.gX of the random variables X v% X 2 ^ and 
A of Bernstein’s example, described in the remark preceding Example 
2 of this section. Show that M(t x ，/ 2 , 0) = 0 T 0)M(0, / 2 , 0 )， 

M(q ， 0, / 3 ) = M(t u 0,0>M(0, 0, / 3 ), Af(0, / 2 , / 3 ) = A/(0, t 2 , 0)M(0, 0, 4), 

but M(t {： t 2 , h) ^ 坤 ■ ， 0, 0 卿， f 2 ,0) M(0, 0 ,以 Thus X u JT 2 , are 
pairwise independent but not mutually independent, 

2*42* Let and be three random variables with means, variances, and 

correlation coefficients, denoted by fi u ^ a], c]; and p l2 , p }3 , p 23 , 

respectively. If E(X x — x^) = b 2 (x 2 — fi 2 ) + 办 3 (;c 3 — 只 3 )， where b 2 and 

are constants, determine b 2 and in terms of the variances and the 
correlation coefficients. 


ADDITIONAL EXERCISES 

k.- 

2-43. Find Pr [X x X 2 < 2], where X x and X 2 are independent and each has the 
distribution with f(x) = 1, 1 < x < 2, zero elsewhere. 

2*44, Let the joint pAS. of X and Y be given by /(x, y) — ^ + 2 十 丫 
0 < x < oo t 0 <y < oo, zero elsewhere. 

(a) Compute the marginal p.d.f. of X and the conditional p.dX of F, 
given X = x, 

(b) For a fixed X — x, compute E(l + x + Y\x) and use the result to 
compute E(Y\x). 

2.45* Let X u X 2 ^ be independent and each have a distribution with p*d,f. 
/(x) = exp ( - x), 0 < x < oo, zero elsewhere. Evaluate: 

(a) Pr (JT, < X 2 \X { < 2X 2 ), 

(b) Pr(X [ <X 2 <X,\X,<l). 

2.46* Let X and Y be random variables with space consisting of the four 
points: (0, 0 )， （】， 1 )， （ 1 ， 0) ，（ 1, —I), Assign positive probabilities to these 
four points so that the correlation coefficient is equal to zero. Are X and 
Y independent? 



Sec. 2.5| Extension te Several Random Variables 


115 


2.47, Two line segments, each of length 2 units, are placed along the x~ 
axis. The midpoint of the first is between x — 0 and x — 14 and that of the 
second is between x = 6 and x = 20. Assuming independence and uniform 
distributions for these midpoints, find the probability that the line segments 
overlap. 

2.48* Let Zand Khave the joint p.±tf(x, y) = (x, y) = (0,0), (I ? 0) T (0, i), 

(I, 1), (2, 1 )，(U 2), (2,2), and zero elsewhere, Fiod the correlation 
coefficient p. 

2-49, Let A "】 and X 2 have the joint p.dX described by the following table: 


(JC" x 2 ) 

(0,0) 

(0,1) (0,2) 

(1,1) a 2) (2, 2) 

/( 又 i ’ 又 2) 

1 

12 

2 I 

12 15 

3 4 1 

12 12 12 - 


Find fi(x x ), / 2 (x 2 ) ， 川， 吣 ， and p, 

2.50* If the discrete random variables and X 2 have joint p.dX 
f(x u x 2 ) = (3x, + 々 )/24, Oi ， x 2 ) = (1 ， 1 )， (1 ， 2 )， （ 2, 1) ，（ 2, 2)，zero else¬ 
where, find the conditional mean E{X 1 \x l \ when — L 

2.51. Let X and Y have the joint p，dX f(x y y) — 2 lx 2 /，0 < x < y < I ? zero 
elsewhere. Find the conditional mean E( Y\x) of Y 7 given X = x. 

2*52* Let ^ and X 2 have the p.dX f(x x , x 2 ) = x t + x 2 ,0 < x】< 1 5 0 < x 2 < U 
zero elsewhere. Evaluate Pr {X x jX 2 < 2), 

2.53. Cast a fair die and let X = 0 if 1, 2, or 3 spots appear, let X = 1 if 4 or 

5 spots appear, and let X — 2 if 6 spots appear. Do this two independent 
times, obtaining and X 2 - Calculate PrdA", — X 2 \ = I), 

2.54. Let a I — <r^ — a 2 be the common variance of X x and X 2 and le t p be the 
correlation coefficient of and X 2 . Show that 

Pr [IW — 川 ）+ ⑷ - 此 )| 乏 M S 2(1 ,t P) . 

k 2 









CHAPTER 


I 


Some Special 
Distributions 


3A The Binomial and Related Distributions 

In Chapter 1 we introduced the uniform distribution and the 
hypergeomeiric distribution. In this chapter we discuss some other 
important distributions of random variables frequently used in 
statistics. We begin with the binomial and related distributions- 

A Bernoulli experiment i§a random experiment, the outcome of 
which can be classified in (but)one of twgjuitually and 

e^^ustive^ways^ say, success orTSlure^^g” female or male, life or 
) death, nondefective or defective^ A sequence of Bernoulli trials occurs 
when a Bernoulli experiment is performed several independent times 
so that the probability of success, say p, remains the same from trial 
to trial. That is, in such a sequence，we let p denote the probability of 
success on each trial. 

Let Z be a random variable associated with a Bernoulli trial by 
defining it as follows: 
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/(success) = I and /(failure) 二 0. 
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That is, the two outcomes，success and failure, are denoted by one and 
zero, respectively* The p-d,f of X can be written as 

^ - 0, I, 

and we say that X has a Bernoulli distribution' The expected value of 



/i - E{X) = X 种 — 〆 (0)(1 ~p) + ⑴⑼”， 

x ^ 0 


and the variance of X is 


\ 

a 2 = var (X) = X ( x 一 pfp^i^ ~ pY~ x 

jf = o 

A 

^p 2 o 一 /0 十 （i ~p) 2 p=po -py 

It follows that the standard deviation of X is a ^ ^/pO — p). 


In a sequence of n Bernoulli trials，we shall let X f denote the 
Bernoulli random variable associated with the /th trial. An observed 
sequence of n Bernoulli trials will then be an n-tuple of zeros and ones* 
In such a sequence of Bernoulli trials，we are often interested in the total 
number of successes and not in the order of their occurrence. If we let 
the random variable X equal the number of observed successes in n 
Bernoulli trials, the possible values of X are 0, 1 ， 2, * _ _ ， n. If x successes 
occur，where x = 0, 1 ， 2, • • • ， n, then n~ x failures occur. The number 
of ways of selecting x positions for the x successes in the n trials is 



Since the trials are independent and since the probabilities of success 
and failure on each trial are, respectively, p and 1 — p 7 the probability 
of each of these ways is /f(l — pf~ x ^ Thus the p.dX of X， say f(x), is 


n 


the sum of the probabilities of these I ^ I mutually exclusive evetits; that 


is* 




f ⑻ 





/^0 - p ) …， x = 0,1 ， f” • ：，$， 


= 0 elsewhere* 
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Recall, if w is a positive integer，that 


(a + bf 


it) 


b x a n 


— X 


Thus it is clear that f{x) > 0 and that 


z m = i f；：) 

戈 jc 0 \ ^/ 


pv 一 


— [(I — p) + p] n = l * 

■ 

〆 ^严 J 1 — " 11P J ■ ■ ■ _ " "" UmrU 1 

That is, f(x) Satisfies the conditions of being a p,d,f, o f a random 
vari ^j£ X offhej^creteTyte A random variable X that has a p.d.f 
ie^rmof f{x) is said to have a binomial distribution 、 and any such 
f(x) is called a binomial pA.f, A binomial distribution will be denoted 
by the symbol b{n, p). The constants n and p are called the parameters 
of the binomial distribution. Thus，if we say that X is 厶 (5, j), we mean 
that X has the binomial p.d.£ 


/W 


mw- 


jc — ' 0 5 1 , * • • ， 5 , 




0 


elsewhere. 


M 


The m,g.f of a binomial distribution is easily found. It is 

4 (i ^£( e iX} 




x 


M 3 


p x ( i - pf 


n 


I 

jc = 0 



(peril - P) n - x 


= [(1 ~p)+pe r f 

for all real values^ The mean " and the variance a 2 of X may be 
computed from M{i). Since 


M'(t) — n[(\ ~ p) + pe r f " l (pe f ) 
and 

M n {t) ^ /i[(l — p) + pe f ] n _ l (pe f ) + n(n — 1)[(1 ~ p) + pe f ] n ~ Kpe 1 ) 2 , 
it follows that 


and 


pi = Af’iJ)) = ftp 


o 1 — — fi 2 ^ np + n(n — \)p 2 — (np) 2 = np(l — /?), 
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kojjis 0T ^hu( f 

Example L Let Xbe the number oH^ds(successes) inn = 7 independent 
tosses The p d.f. of X is 

仍舞指 f m = rJm h-i) , x = o, 1 ， 2,… ， 7, 


= 0 elsewhere. 

Then X has the m.gX 

mo -G+k ) 7 ， 

has mean — np = j y and has variance cr 2 = np(l — p) 
have 


l ， Furthermore, we , 


Pr(0H< 1) 


= 


Z fix) 

r — 0 


128 ' 128 128 


and 


Pr(jT-5)-/(5) 

7! 

= 5f2! 



21 

128 


Example 2. If the m.g,f, of a random variable X is 

M(t) = (f + 卜’) 5 ， 

then Xhas a binomial distribution with n — 5 and /? = |; that is，the p.d.f. of 


Xis 


fix) 


&mr 


X 


0 ， 1 ， 2, 


= 0 elsewhere. 

Here — and a 2 = np{\ — /?)= 事 

Example 3. If Y is 6(n，！)，then Pr(F> I) - I -Pr(r=0)- I - (f)' 
Suppose that we wish to find the smallest value of n that yields 
Pr(K> 1)> 0.80. We have 1 — (妒 > 0.80 and 0,20 > 
inspection or by use of logarithms, we see that n — 4 is the soiuticm. That is, 
the probability of at least one success throughout n — 4 independent 
repetitions of a random experiment with probability of success p=^\is greater 
than 0.80. 

Example 4 Let the random variable Y be equal to the number of 
successes througjiout n independent repetitions of a random experiment 
with probability p of success. That is, Y is b(n ， p). The ratio Y/n i$ called the 
relative frequency of success. For every e > 0, we have foJtri 狀其例 


re 

i 


mu Vr 


Y 


n 


P 


> £ 1 = Pr (17 — np\ S: €n) 


Prfir^^u^ 



n 


PO -p) 


a 
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where /i 
with k 



and a 2 = np( 1 一 /?)，In accordance with Chebyshev’s inequality 
n/p(l — p), we have 


Pr (17 — juJ > € 


n 


^(1 - P) 


o < 


P(1 ~p) 


ne 


and hence 



Now, for every fixed £ > 0, the right-hand member of the preceding inequality 

is close to zero for su 伍 ciently large That is, 

—- - - - - -- 


lim Pr 




and 



Since this is true for every fixed £ > 0, we see, in a certain sense, that the relative 
frequency of success is for large values of n, close to the probability p of 
success. This result is one form of the law of large numbers. It was alluded to 
in the initial discussion of probability in Chapter I and wiU be considered 
again，along with related concepts, in Chapter 5* 

Example 5. Let the independent random variables X x , X 2 , X 3 have the 
same distribution function F{x). Let Y be the middle value of X,, JT 2 , JT 3 . To 
determine the distribution function of V, say G(y) — Pr(V < y) t we note that 
^ ^ y ^ an d only if at least two of the random variables X { , X 2 , are less; 
辦时 or equal to ^ Let us say that the /th “trial” is a success if X } <： y y 
/ = I ， 2, 3; here each “trial” has the probability of success f{yy In this 
terminology, G(y) = Px(Y < y) is then the probability of at least two 
successes in three independent trials. Thus 

, G(y) - [ 尺 > 姐 1 — f{y)] + [F(y)]\ 


If/^ jc) is a continuous type of distribution function so that the p+df. of JHs 
厂 ⑻ =then the p.d.f. of Y is 

s(y) - G\y) - 6 _] [卜 mifiyl 

Example 6, Consider a sequence of independent repetitions of a random 
experiment with constant probability p of success. Let the random variable 
F denote the total number of failures in this sequence before the rth success; 
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that is，F + r is equal to the number of trials necessary to produce exactly r 
successes. Here r is a fixed positive integer* To determine the p*d + f. of Y f let 
y be an element of {y:y — 0, I， 2, Then, by the multiplication rule of 
probabilities, Pr ( K = j) = g(y) is equal to the productjof the probability 

of obtaining exactly r — I successes in the firstly + r — 1 trials and the 
probability p of a success on the (y g(y) of Y is 

given by 

g(y) = ^^7 M/^(i -p) y , ， = 0, l, 2 ,…， 


= 0 elsewhere, 

A distribution with a p.d.f of the form g(y) is called a negative binomial 
distribution; and any such g(y) is called a negative binomial p.df. The 
distribution derives its name from the fact that g(y) is a general term in 
the expansion — (1 — p)]~ r ^ It is left as an exercise to show that the 
m.gX of this distribution is M(t) =jf[l — (1 — for t < —In (1 — p). 

If ^ — 1, then Y has the p,df. 

iiy) -p(l - p) y , y = 0, 1 ， 2, • • ■ ， 

_ • 

zero elsewhere, and the m.g.f. M(t) - p[l — (1 — In this special case, 

r — 1， we say that Y has a geometric distribution.} 

The binomial distribution is generalized to the multinomial 
distribution as follows. Let a random experiment be repeated n 
independent times. On each repetition, the experiment terminates in 
but one of k mutually exclusive and exhaustive ways, say 
C u C 2 ,., Q. Let p ； be the probability that the outcome is an element 
of C, and let p t - remain constant throughout the n independent 
repetitions, i = 1 ， 2, • • • ，众 .Define the random variable 不 to equal 
to the number of outcomes that are dements of C h i — 1 ， 2 ,…， 
k _ L Furthermore, let X\ 7 x 2 ^ ^ ^ x k ^ y be nonnegative integers so 
that 弋 + + . •. + A， 〗 < Then the probability that exactly 

terminations of the experiment are in C ]9 , exactly x k _ { 

terminations are in C* — ■ ， and hence exactly n — (x A + - h x k _i) 

terminations are in C k is 


n\ 






where x k is merely an abbreviation for w — (x! H - H x k _ } ). This is 
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the multinomialpA.f. of k — 1 random variables X u X 2l .,. ，尤 — 》of 
the discrete type. To see that this is correct, note that the number of 
distinguishable arrangements of C,% x 2 C 2 \ .… ， x k C k 's is 



and that the probability of each of these distinguishable arrangements 
is 

Hence the product of these two latter expressions gives the correct 

probability, which is in agreement with the formula for the multinomial 
p,d.f_ 

When k = 3, we often let X ~ X { and Y — X 2 ; then 
n — X — F = JT 3 . We say that X and Y have a trinomial distribution. 
The joint p.d.f. of X and Y is 


y)= 


nl 

jc! j! (n — x — 


-fP^P2PV X 


-y 


i 


where x and y are nonnegative integers with x + y < n, and p u p 2 , 
and /? 3 are positive proper fractions with />, + p 2 + py~ I ； and let 
f(x y 少 ）= 0 elsewhere- Accordingly, f(x y y) satisfies the conditions of 
being a joint p.d.f, of two random variables X and Y of the discrete 
type; that is, f(x^ y) is nonnegative and its sum over all points y) 
at which /(x, y) is positive is equal to (p\ + p 2 + Pif — L 

If n is a positive integer and a 2 , a 3 are fixed constants, we have 


n n •* jt 


nl 


^?0 y?o X! 〆 （/I — X — 7)! a, 制 


x ~y 


n 


I 

- 0 


n 


I, 


R — X 


E .At 


文 ！ （w _ x)! 么少 ! {n^x- y)\ 


A n\ 

^ xt (n - x)\ 


a x x (a 2 + a 3 ) … 


^ (^i + ^2 + a 3 )\ (I) 

Consequently, the m-g.f. of a trinomial distribution, in accordance 
with Equation (1)，is given by 

a w i 


JT — 0 ^ 0 


xl yl(n — x — y)\ 


ip^y^ypi-^y 


^ iP\e u + p 2 e h + pi)\ 
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for all real values of and t 2 . The moment-generating functions of the 
marginal distributions of X and Y are, respectively, 

{p v e h + p 2 +p 3 ) tt = [(J - Pi) + p v e tl Y 
and : 

^(0, t 2 ) = (p x +p 2 e i2 +Pif ^ [{l - p 2 ) + p t e t2 f. 

We see immediately, from Theorem 4， Section 2.4，that X and 

Y are dependent random variables. In addition, X is h{n,p\) and 

Y is b(n 7 p 2 ). Accordingly, the means and the variances of X 
and Y are, respectively, ^ ^ np u ^ np 2 , a ]= 叩|(1 — 几）， and 
心即 2(1 -Ptl 

Consider next the conditional p-d.f, of Y, given X = x. We have 

, ► S 4 

0 , 1 ， … ， n—x^ 

elsewhere, 




(n - x)! 


Pi 




yl (n—x—y)l \l —p { 



P3 


tt—x — y 


P\ 


y 




Thus the conditional distribution of Y, given X ^ x, is b[n — x, 
p 2 i(l 一 Pi)]. Hence the conditional mean of Y f given X ~ x, is the 
linear function 



We also find that the conditional distribution of X, given Y = y, is 
b[n — y,pi/(l — p 2 )] and thus 、 



Now recall (Example 2, Section 2,3) that the square of the correlation 
coefficient，say p 2 ，is equal to the product of — p 2 /(l ~p\) and 
—P\l(} - Pi), the coefficients of x and y in the respective conditional 
means. Since both of these coefficients are negative (and thus p is 
negative)，we have 


j 



In general, the m,g.[ of a multinomial distribution is given by 


M(h ， •…， 4_,) = + + Pk ) n 
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for all real values of 4, *. • ， 4 “ i • Thus each one-variable marginal 

p-d.f. is binomial, each two-variable marginal p.dX is trinomial, and 
so on. 


EXERCISES 


3 丄 If the m.gX of a random variable A" is (} + |〆) 5 , find Pt(X= 2 or 3). 
3*2. The of a random variable A" is (| + Show that 


PrQi —2a < X < fi + 2<r) 




I, cm) 


3.3. If X is b(n, p\ show that 



3.4. Let the independent random variables X u X 2 , have the same p.d.f 

J(x) — 3X 2 ,0 < x < 1 ， zero elsewhere. Find the probability that exactly two 
of these three variables exceed 

t 

3.5. Let Y be the number of successes in n independent repetitions of a 

random experiment having the probability of success If n = 3, 

compute Pr (2 < Y);if n = 5, compute Pr (3 < Y). 

3.6* Let F be the number of successes throughout n independent repetitions 

of a random experiment having probability of success p-^ Determine the 
smallest value of n so that Pr (1 ^ y> > 0.70. 

3.7* Let the independent random variables X l and X 2 have binomial 
distributions with parameters n } =3,p t and n 2 4, p 2 ^ ^ respectively. 
Compute Pr (X t - X z ). 

Hint: List the four mutually exclusive ways that X\ X 2 and compute 
the probability of each. 

3.8. Toss two nickels and three dimes at random. Make appropriate 
assumptions and compute the probability that there are more heads 
showing on the nickels than on the dimes* 

3.9. LctX [r X 2i ，， • ， X k -、have a multinomial distribution. 

(a) Find the m,gX of JT 2 , 不 ,… ，尤 _ 

(b) What is the p.dX of Z 2 , X 、， … ， X 卜 

(c) Determine the conditional p.d.f. of X i9 given that 

-^2 ^ • f j ~ Xff _ I, 

(d) What is the conditional expectation EiX^, 

3.10. Let X be b(2,p) and let Y be b(A, p), IfPr(Jir> 1) = |, find Pr (7 > 1). 
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3.11. If = r is the unique mode of a distribution that is b(n f /?), show that 

(«+!)/>— 1 < r < (n + l)p m 

Determine the values of x for which the ratio f(x + l)//(x) > 1, 

3.12_ Let X have a binomial distribution with parameters n and 
Determine the smallest integer n can be such that Pr(X^ i) ^ 0,85. 

3*13 ： Let X have the pAS.f[x) = (!)(!)' x = 0, 1, 2, 3, … ， zero elsewhere. 
Find the conditional p ， df. of X， given that X > 3, 

3.1 十 One of the numbers 1 ， 2, ， ， • ， 6 is to be chosen by casting an unbiased 
die. Let this random experiment be repeated five independent times. Let the 
random variable X } be the number of terminations in the set {x;x= 1 ， 2, 3} 
and let the random variable X 2 be the number of terminations in the set 
{x : ^ = 4, 5}. Compute Pr (X, = 2, = 1), 

3.15，Show that the m.g.f. of the negative binomial distribution is 

M(0 - （ 1-/OO 一： Find the mean and the variance of this 

distribution. 

Hintiln the summation representing M(t\ make use of the MacLaurin’s 
series for (1 - wy r . 

3*16. Let Xi and Xi have a trinomial distribution. Differentiate the 
moment-generating function to show that their covariance is -np } p 2 . 

■t * 

3.17. If a fair coin is toss^l at random five independent times，find the 

conditional probability of five heads relative to the hypothesis that there 
are at least four heads. 

3_I8. Let an unbiased die be CESt at random seven independent tioies. 
Compute the conditional probability that each side appears at least once 
relative to the hypothesis that side 1 appears exactly twice, 

3.19. Compute the measures of skewness and kurtosis of the binomial 
distribution b(n, p). 

■« 

3.20* Let 

= 0, 1，…， A , 

=1 ， 2,3,4, 5 ， 

zero elsewhere, be the joint p,d*f, of X } and X 2 . Determine: 

(a) E{X 2 l 

(b) u(x t ) = E(X 2 \ Xl ). 

(c) E{u(X x )l 

Compare the answers of parts (a) and (c). 
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5 xt 

Hint: Note that E{X 2 ) - X] E A) and use the fact that 

Jf| d I jr2 ~ 0 

E/= «/2. Why? • 

3.21, Three fair dice are cast In !0 independent casts, let X be the number 
of times all three faces are alike and let Ybe the number of times only two 
faces are alike. Find the joint p,di of X and Y and compute E(6XY), 


3*2 The Poisson Distribution 


Recall that the series 


l + m 


A 

m 

十 2! 


m 3 

+ — + * * * == 

十 3! 十 


^ nf 

S 0 


convcrges.Jbr ail values of m, to Consider the function f(x) defined 

■ ■一 ___jj_■ ■ ■ ■ I 1 " * 

by 

m x e^ m 

/(^) = t ， x = 0, 1，2,…， f 

■#Ar * r 

， * J 

— 0 elsewhere, 

where m > 0, Since m > 0, then f(x) > 0 and 


I/W 


X 


z 

r = 0 


nfe 


jc! 


e 


m 


I 


m 


X 


= 0 


x! 


e^ m e i 


1 ； 


that is，/» satisfies the conditions of being a p,dX of a discrete type 
of random variable. A random variable that has a p.d‘f. of the form 
f(x) is said to have a Poisson distribution, and any such f(x) is called 
a Poisson p.dj. 

Remarks* Experience indicates that the Poisson p.dX may be used in a 
number of applications with quite satisfactory results. For example, let the 
random variable X denote the number of alpha particles emitted by a 
radioactive substance that enter a prescribed region during a prescribed 
interval of time. With a suitable value of m, it is found that X may be 
assumed to have a Poisson distribution. Again let the random variable X 
denote the number of defects on a manufactured article, such as a 
refrigerator door. Upon examining many of these doors, it is found, with an 
appropriate value of m, that may be said to have a Poisson distribution. The 
number of automobile accidents in some unit of time (or the number of 
insurance claims in some unit of time) is often assumed to be a random 
variable which has a Poisson distribution* Each of these instants can be 
thought of as a process that generates a number of changes (accidents. 
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claims, etc,) in a fixed interval (of time or space, etc,). If a process leads to 
a Poisson distribution，that process is called a Poisson process ，Some 
assumptions that ensure a Poisson process will now be enumerated. 

Let g(x^ w) denote the probability ofx changes in each interval of length 
w. Furthermore, let the symbol o{h) represent any function such that 
lim [o{h)jh] — 0; for example, h 2 — o{h) and o{h) + o(h) — o(h): The Poisson 
postulates are the following: 

L ^{1, h) — Xh + o(h )，where X is a positive constant and h > 0. 

2. X h) = o(h). 

X = 2 

3, The numbers of changes in nonoverlapping intervals are independent. 

Postulates I and 3 state，in effect, that the probability of one change in a 
short interval h is independent of changes in other nonoverlapping intervals 
and is approximately proportional to the length of the interval. The substance 
of postulate 2 is that the probability of two or more changes in the same short 
interval h is essentially equal to zero* If x — 0, we take 笑 (0,0) = I. In 
accordance with postulates 1 and 2, the probability of at least one change in 
an interval of length A is 义 A + o(h) + o(h) — Xh + o(h). Hence the probability 
of zero changes in this interval of length h is l — Xh — o(h). Thus the 
probability g(Q ， w-^h) of zero changes in an interval of length w + h is, in 
accordance with postulate 3, equal to the product of the probability ^(0, >v) 
of zero changes in an interval of length w and the probability [l — kh — o(^)] 

of zero changes in a nonoverlapping interval of length h. That is, 

•> 

贫 (0, w + h) = g(0, w)[l -Xh- o(h)l 


Then 


茗 (0, w + h) — g(0, w) 


h 


^g(0, w) - 


o(h)g(0, w) 
h H 


If we take the limit as A-^0, we have 


…) 卜一 如 (0, wO . 

The solution of this differential equation is 

g(0, M f ) = ce^ Aw . _ 

The condition g(0 t 0) — 1 implies that c — 1; so 

^( 0 , w) = 

If x is a positive integer, we take g(x, 0) = 0. The postulates imply that 
g(x, w + h) = [g(x, w)I[I - Xh - o(h)] + [g(x - I, w)][Xh + o(h)] + o(h). 
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Accordingly, we have 


g(x ，w + h)— g(x f w) 




一々 (X ，⑷）十 kg(x - !, w) 


m 

h 


and 


- Xg(x 3 w) + Xg{x - 1, w) 9 

for x ， 1 ， 2, 3, • It can be shown，by mathematical induction, that the 
solutions to these differential equations, with boundary conditions g(x, 0) = 0 
for x — 1 ， 2, 3, *.. ， are ， respectively 5 

(Xw) x e' Aw 

♦ - 1 ， ^2^ ，又 *_ 


犮 (x ， w) 


x! 


Hence the number of changes X in an interval of length w has a Poisson 
distribution with parameter m — Xw. 

The m.g.f of a Poisson distribution is given by 


卿 ）=£ e tx f{x) = £ e fx 




X 


— 0 


€ 


m 


oo 

s 


(m^Y 


X 


to 


e — __ ^m(e , — I) 


for all real values of t. Since 


M f {t) = e m(et ~ l \me l ) 


and 


(0- + V〆) 2 : 


then 


and 


^ = AT(0) — m 


( X 2 = 你 "⑼ 一 " 2 = 切 + 故 2 —供 2 


m. 


That is, a Poisson distribution has /i a 2 = m > 0, On this account, 
a Poisson p.d.f is frequently written ’ 


/W 




fi x ei 

~lc\ f 


x = 0, 1, 2- 


= 0 elsewhere. 
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Thus the parameter m in a Poisson p_dX is the mean pi. Table I in 
Appendix B gives approximately the distribution for various values of 
the parameter m = fL 


Example I- Suppose that X has a Poisson distribution with fi = 2. Then 
the p.dX of X is 


/(x)= 


2 x e^ 2 



jv = 0， 1, 2， _ ■. ， 

elsewhere. 


The variance of this distribution is <j 2 = /x = 2, If we wish to compute 
Pr (I < X), we have 

Pr(l <X)^l - Pr (X=0) 

=1 一 /(0) ^ 1 - £- 2 = 0.865 ， 

approximately, by Table I of Appendix B, 


Example 2* If the m.g.f. of a random variable X is, 

M(/) = #〆 _ ■)， 

then Xhasa Poisson distribution with /4 = 4. Accordingly, by way of example. 


Pr (X = 3) 


4^e 




3! 


3 


e 


or, by Table I, 


Pr(X=3) = Pr (JT < 3) - Pr (X<2) = 0.433 - 0.238 = 0/195 


Example 3. Let the probability of exactly one blemish in 1 foot of wire be 
about “ and let the probability of two or more blemishes in that length be ， 
for all practical purposes, zero* Let the random variable X be the number of 
blemishes in 3000 feet of wire. If we assume the independence of the numbers 
of blemishes in nonoverlapping intervals, then the postulates of the Poisson 
process are approximated, with X = — and w ― 3000. Thus X has an 
approximate Poisson distribution with mean 3000(^) — 3, For example, the 
probability that there are exactly five blemishes in 3000 feet of wire is 

and by Table I, 

Pr (Z = 5) = Pr (JT<5)-Pr (JT < 4) = 0J0I, 


approximately. 
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EXERCISES 


th f, random variable X has a Poisson distribution such that 
Pr - 1) = Pr {X ^ 2 \ find Pr (X = 4), 

3 予 The m.g.f, 0 f a random variable X is - ° 4 Show that 
Pr (fi - la <X<fi + 2(F)^ 0,931, 

3.24_ 〖n a lengthy manuscript, it is discovered that only 13.5 percent of the 
pages contam no typing errors. If we assume that the number of errors per 
p，ge is a random variable with a Poisson distribution, find the percentage 
ot pages that have exactly one error. 

Let p.d.fT/(x) be positive on and only on the nonnegative integers. 
Gi, that f(x) = (4/x)/(x — 1) ， 文 = 1 ， 2, 3 ，… • Find /(x). 

Hint Note that/(l) = 4/(0) ， /(2 卜 (4 2 /2!)/(0), and so on. That is, find 
each/(xj m terms of/(0) and then determine /(0) from 

: 1 =/(0) +/(” +,(2) + …， 

3 . 26 . Let AT have a Poisson distribution with pt = 100. Use Chebyshev^s 
inequality to determine a lower bound for Pr (75 < ^ < 125), 

3,27- Given that g(x, 0) — 0 and that 

D w[g(x,w)]= - ^g(x t w) + Xg(x - l y w) 

I ， 2, 3,…. If g(o,show, by mathematical induction. 


for x 
that 


g(x, w) 


{Xw) x e 


Xw 


x — I, 2, 3,, 


.M, Let the number of chocolate drops in a certain type of cookie have a 
oisson distribution. We want the probability that a cookie of this type 
contains at least two chocolate drops to be greater than 0.99. Find the 
smallest value that the mean of the distribution can take. 

3 予 Compute the measures of skewness and kurtosis of the Poisson 
distribution with mean fi. 

4 

3.3(^ th= average a grocer sells 3 of a certain article per week. How many 

T sh ^ u * d he have in stock so that the chance of his running out within 
a wee will be less than 0.01? Assume a Poisson distribution, 

3.31 Let Z have a Poisson distribution. If Pr (JT = I) - Pr (JT - 3) find the 

mode of the distribution. 

331 Let X have a Poisson distribution with mean K Compute, if it exists, 
the expected value E(X]). 
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333. Let X and Y have the joint p_d.f, f(x, y) = e~ 2 /[x\(y - x)l], y = 
0, 1, 2, . ; x = 0, . •，少 ， zero elsewhere. 

⑻ Find the M{t u t 2 ) of this joint distribution. 

(b) Compute the means，the variances，and the correlation coefficient of X 
and 7, 

(c) Determine the conditional mean E{X\y), 

Hint: Note that 


x—^ 


fexp (ttxJlyl/ixl = [1 + exp (r,)F 


Why? 


3-3 The Gamma and Chi-Si 


Distributions 


square u 

I _ I 嚇祷办 

In this section we introduce the gamma and chi-square distri* 
buttons. It is proved in books on advanced calculus that the integral 


/ *oO 


y^~ l e^ y dy 




exists for a > 0 and that the value of the integral is a positive number. 
The integral is called the gamma function of a, and we write 


r(of) — 


f'oo 


f~ x e~ y dy. 




If a — 1, clearly 


r(i) 


f*00 


e— y dy 




If a > 1, an integration by parts shows that 


r(a) — (a — 1) 


y* - 2 e~ y dy = {a — l)F(a — 1). 




Accordingly, if a is a positive integer greater than I, 

r ⑷ =( 卜 l)(a - 2) … (3)(2)(l)r(I) - (a ^ 1)! 


Since r(l) = I, this suggests that we take 0! = I, as we have done. 
In the integral that defines F(a), let us introduce a new variable x 
by writing y = x/fi, where /? > 0_ Then 


f^QQ 


r ⑻ 


J o 



e 




dx. 
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or ， equivalently, 


f*OD 


^0 


nocw 


^^ l e~ x ^dx. 


Since a > 0 ,芦 > 0, and r(a) > 0, we see that 


/W 


r ⑷浐 


妒 o < x < oo 


0 elsewhere, 


is 气 p.dX of a random variable of the continuous type. A random 
variable X that has a p.d.f, of this form is said to have a gamma 
distribution with parameters a and and any such f(x) is called a 
gamma-type pj.f, ' 

Kemark. The gamma distribution is frequently the probability model for 
waiting times; for instance, in life testing, the waiting time until “death” is the 
random variable which frequently has a gamma distribution. To see this, let 
us as sume the postulates of a Poisson process and let the interval of length 
⑷ be a time interval Specifically, let the random variable W be the time that 
is needed to obtain exactly k changes (possibly deaths), where A ： is 过 fixed 
positive integer* Then the distribution function of W is 

G(w) ^ Pr(W^ w) = I — Pr (fF> w). 

However, the event W > for w > 0, is equivalent to the event in which there 
are less than * changes in a time interval of length w* That is, if the random 
variable X is the number of changes in an interval of length w, then 

Pr W ) = Pr(X = jc) = ^ 

Jr = 0 jr*{J 

It is left as an exercise to verify that 


jc! 




2 k - l e^ z 
(fc-Vfi 


dz 


(Xw) x e- lw 

jt *= 0 


xl 


If ， momentarily, we accept this result, we have，for w> 0, 

* 


G ⑽ 




z k ~ l e~ 


dz 


^0 


「 A V 十 

m 


dz. 


and for < 0, G(w) = 0_ If we change the variable of integration in the 
integral that defines G(w) by writing z = Xy 7 then 

， X k y k ~ x e^ Xy 


G(w) 


m 




w> 0, 
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and G(w) = 0, w < 0* Accordingly, the p.dX of W is 

—, 0 < W < QO^ 


g(^) = GXw) 


X k y^ ~ [ e 




m 

0 elsewhere. 


That is, W has a gamma distribution with ol — k and J? - l/A. If W is the 
waiting time until the first change, that is, if A = I, the p.d*f. of W is 

g(w) = Xe~ lw ^ 0 < w < ao 9 

# 

= 0 elsewhere, 

and 『 is said to have an exponential distribution with mean P = I /l. 

♦ 

We now find the m.g.f, of a gamma distribution* Since 


M(t) 


^*00 


e 


tx 




t^co 


J 0 n 離 




广 -剛也 


we may set y == x(l 




o 


fi 娜 ， t Gt X 

fiy 

r ⑻浐 [T^Jt 


办 /(I 一仰 ， to obtain 

-！ “ ^ 

e~ y dy. 


That is, 


a 


M(t) 


一炽 




m 


f^ x e^ y dy 


Now 


(1 - 财， 


t < 




AT ⑺一 (-a)(l - 奶十 、( 一 阶 

and 

— ( — 0C)( 一 OC — 1)(1 一杉 - — 办 )2 ， 
Hence, for a gamma distribution, we have 

fi — M\0) = ol^ 


and 


o 2 = Af"(0) — fi 2 ~ a(a + l)^? 2 — ot 2 ^ 2 — aB 2 . 
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Example h Let the waiting time W have a gamma p.dX with a = k and 
P = l/^ Accordingly, E(W) = k/L If = then E{W) = \ jX\ that is, the 
expected waiting time for k — 1 changes is equal to the reciprocal of L 

Example 2. Let Z be a random variable such that 

i- 

単"卜 (m = 3)! 3' 1，2,3, ••… 

Then the m.g;f, of X is given by the series 



4! 3 
3H! 


5!3 2 

t + sn\ 


+ 

3! 3! 


This ， however, is the Maclaurin’s series for (I - 3r)" 4 , provided that 
一 J <3i < l. Accordingly, A" has a gamma distribution with a = 4 and = 3. 

Remark* The gamma distribution is not only a good model for waiting 
times, but one for many nonnegative random variables of the continuous type. 
For illustration, the distribution of certain incomes coutd be modeled 
satisfactorily by the gamma distribution, since the two parameters a and ^ 
provide a great deal of flexibility* Several gamma probability density functions 
are depicted io Figure 3,1. 

Let us now consider the special case of the gamma distribution in 
which a = r/2, where r is a positive integer, and f} ^ 2, A random 
variable X of the continuous type that has the p_dX 

/( x ) = r ^2)2^ Y /2 ’ V' 0 < x < °o’ 

— 0 elsewhere, 

\ 

and the m,gX 

^ _ = (1一2，)- " 2 ， 

is said to have a chi-square distribution, and any /(x) of this form is 
called a chi-square p.dj. The mean and the variance of a chi-square 
distribution are fi = afi = (r/2)2 = r and a 1 = afi 2 - (r/2)2 2 = 2r, 
respectively. For no obvious reason，we call the parameter r the num¬ 
ber of degrees of freedom of the chi-square distribution (or of 
the chi-square p.d.f.)，Because the chi-square distribution has an 
important role in statistics and occurs so frequently, we write, for 
brevity, that X is % 2 i r ) to mean that the random variable X has a 
chi-square distribution with r degrees of freedom. 
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Example J. If X has the p.d,f # 

f(x) = \xe~ xfl , 0 < jc < oo, 

— 0 elsewhere. 


then X is 之 2 ⑷， Hence 爿 = 4, f = 8, and M{t) = (I - 2tY 2 , t < 

Example 4. If X has the m.g.f. M(t) = (1 - < !， then X is ^ 2 (I6). 

If the random variable X is x 2 ( r )^ then, with c! < c 2> we have 

Pr (Ci < JT<c 2 ) = Pr (^ < c 2 ) -Pr (X < 

since Pr(X - c y ) = 0, To compute such a probability, we need the 
value of an integral like 1 


fix) 



f(x) 



FIGURE 3*1 
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Tables of this integral for selected values of r and jc have been prepared 
and are partially reproduced in Table II in Appendix B. 

Example 5. Let be^ 2 (I0). Then, by Table II of Appendix B, with r = I0 t 
Pr (3.25 ^ JT< 20.5) = Pr (X < 20.5) - Pr (X < 3.25) 

= 0,975 ^ 0.025 - 0.95. 

Again，by way of example, if Pr (a < ^ = 0.05, then Pr (X < a) — 0.95, and 
thus a — 18.3 from Table II with r — !0, 


Example 6, Let X have a gamma distribution with cc = r/2, where r is a 
positive integer，and 办 > 0. Define the random variable Y ― 2X/p- We seek 
the pAS. of Y. Now the distribution function of Y is 


G(y) (Y < y) = Pr { X < 



If y ^0, then G(y) = 0; but if ^ > 0, then 


G(y) 




rfiyf 1 




r(r/2 ) 俨 


dx. 


Accordingly, the p.dJ. of Y is 

4 ■' <| 

g(y)^ G r (y )= 


m 


「 (r/2 ) 俨 


0y!2) rf2 - l e^ 12 


r(r/2)2" 2 


y r/2-\ e ^yl2 


if y > 0. That is, Y is x\ r ) 


EXERCISES 

■ # m 

3*34, If (1 一 2f) _6 T f ^ is the m.g.f* of the random variable X f find 
Pr (X < 5.23). 

335. If X is x 2 (5), determine the constants c and d so that 
Fr (c < X <d) ^ 0.95 and Pr (X < c) — 0.025, 

3.36, If A" has a gamma distribution with a = 3 and j? = 4, find 
Pr (3.28 <JT< 25,2). 

Hint: Consider the probability of the equivalent event 1.64 < Y < 12*6, 
where Y = 2JT/4 = X/2. 

辱 * 

337. Let JT be a random variable such that E(X m ) = (m + 1)! 2 m ^ 
m = !, 2, 3,.Determine the m.g.£ and the distribution of X, 
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3,38. Show that \ 

I m Z e ^ dZ " *=1’2’3 . 

This demonstrates the relationship between the distribution functions of 
the gamma and Poisson distributions. 

Hint: Either integrate by parts k — 1 times or simply note that the 
“antiderivative” of ? 一 l e^ x is 

+ 

• ^2 k - l e^^(k-l)z k - 2 e- £ - ^(k^l)\e- 2 

by differentiating the latter expression. * 

3*39. Let $ ， Z 2 , and X 3 be independent random variables, each with p.d.f. 
f(x) - e~ x y 0 < x < oo 3 zero elsewhere ； Find the distribution of 
Y = minimum {X u Z 2 , X 3 ) m 

ffint:Pr(Y<y) = l - Pr (Y > y) ^ 1 - Pt (X t > y, i ^ 1 ， 2,3). 

3,40. Let X have a gamma distribution with p.d.f. 


B 




0 < x < co f 


zero elsewhere. If jc = 2 is the unique mode of the distribution, find the 
parameter p and Pr (X < 9.49). 


3,41. Compute the measures of skewness and kurtosis of a gamma distri¬ 
bution with parameters u and 


3*42* Let X have a gamma distribution with parameters a and p. Show that 
Pr(X> 2ocP)< (2/e)\ 

Hint: Use the result of Exercise 1JI5, 

3.43, Give a reasonable definition of a chi-square distribution with zero 
degrees of freedom* 

Hint: Work with the m.gX of a distribution that is x 2 ( r ) and let r = 0* 

3.44. In the Poisson postulates on page 127, let i be a noimegative function 
of w y say A(w% such that D %v [g(0^ w)] = — A(w)giO^ w). Suppose that 
X(w) = krv/~\ r > L 

(a) Find g(0, w) noting that g(0, 0) = L 

(b) Let Wbc the time that is needed to obtain exactly one change. Then 
find the distribution function of tV 9 namely G(w) = Pt(W < w )= 

1 — Pr (W > w) = l — g(0, w) 9 0 <w 9 and then find the p_d.f, of HK 
This p.d.f. is that of the Weibuil distribution ，which is used in the study 
of breaking strengths of materials, ， 

3.45* Let X have a Poisson distribution with parameter m. If m is an 
experimental value of a random variable having a gamma distribution with 
a = 2 and ^ — l y compute Pr {X = 0, 1 ， 2). 、 
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3.46. Let Zhave the uniform distribution with p*d.f_/(x) — 1， 0 < x < 1, zero 
elsewhere- Find the distribution function of K = —2 In X. What is thep,d 丄 

of r? , 

3-47* Find the uniform distribution of the continuous type that has the same 
mean and the same variance as those of a chi-square distribution with 8 
degrees of freedom. 


3*4 The Normal Distribution 


Consider the integral 






驀 

This integral exists because the integrand is a positive continuous 
function which is bounded by an integrable function; that is’ 


y 


2 ' 


0<exp| -yl<exp(— | 少 |+ U 


_ OO < 少 < 00, 


and 


/ *oo 


exp (— I 少 I + 1) 办 = 2e, 


To evaluate the integral 4 we note that / > 0 and that I 2 may be written 

y 2 + 2 : 


I 2 


exp 


2 


dy dz. 


This iterated integral can be evaluated by changing to polar co¬ 
ordinates, If we set j = r cos 6 and z — r sin 0, we have 

/*2n 


f 


e 一郎 r dr dQ 


o 




d6 — 2tt* 




Accordingly, / = and 


r*oo 




e~ y2jl dy — 















Sec. 3,4] The Narmal Distribution 


139 


If we introduce a new variable of integration, say x, by writing 


y 


x — a 


b > 0, 


the preceding integral becomes 


exp 


oo 


by/ln 


(戈一 a ): 

2b 1 


dx 


Since t > 0, this implies that 


/W 


b^/2n 


exp 


O — a): 


2b 


2 


—QO < X < CO 


satisfies the conditions of being a p*df, of a continuous type of ran¬ 
dom variable* A random variable of the continuous type that has a 
p.d,f. of the form off(x) is said to have a normal distribution ， and any 
f(x) of this form is called a normal p,dX 

We can find the m-g.f. of a normal distribution as follows* In 


M(t) 


€ 


tx 


by/2n 


exp 


(x _ a) 2 
— 2A 2 一 


dx 






b^/ln 


exp 


2b 2 tx + x 3 - lax + a 2 
2b 1 


dx 


we complete the square in the exponent* Tims M(t) becomes 

a 2 — (a + b 2 t) 2 


M{t) = exp 


x exp 


2 b 2 


(x — a — b 2 ty 




2b : 


dx 


exp at + 


b 2 t T 

T 


because the integrand of the last integral can be thought of as a normal 
p.dX with a replaced hy a + b\ and hence it is equal to 1, 

The mean fi and variance a 2 of a normal distribution will be 
calculated from M{t). Now 

M\t) - M(t)(a + b 2 t) 

and ; 

= M(t){b 2 ) + M{t)(a + b 2 t)\ 
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Thus 


fx = M f {0) = a 


and 


o 2 — Af w (0) — /i 2 = A 2 + a 2 — a 2 — 

This permits us to write a normal p,d.f_ in the farm of 


A^) 




exp 


( 卜 "): 


5 


— oo<x<co 


a form that shows explicitly the values of ft and a 2 . The m,gX M{t) can 
be written 


M(t) = exp ( ^ + 


Example L If X has the m_gX 

* M(t) = ^^ 2t \ 

then X has a normal distribution with fx = 2, a 2 = 64. 

The normal p.d.f. occurs so frequently in certain parts of statistics 
that we denote it, for brevity, by NQi, a 2 、— Thus, if we say that the 
random variable X is 7V(0, 1), we mean that Xhasa normal distribution 
with mean fi = 0 and variance a 2 ^ 1, so that the p^dX of X is 


A^) 


^Tn 


e 


x 2 !2 


CO < X < <X) 


If we say that X is iV(5,4)，we mean that X has a normal distribution 
with mean / = 5 and variance ^ = 4, so that the p.d.f. of JT is 

(义 一 5 厂 


Ax) 


Moreover, if 


lyfht 


exp 


2(4) 


— 00 < JC < 00. 


= e 气 


then X is JV(0,1). 

The graph of 




(Ty/ln 


exp 


(太一 

2c 2 


OO < X < oo y 
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is seen (I) to be symmetric about a vertical axis through x = " s ( 2 ) 
to have its maximum of i/(o , N /27r) at x = #， and ( 3 ) to have the x-axis 
as a horizontal asymptote. It should also be verified that ( 4 ) there 
are points of inflection at jc = p ± 

Remark ， Each of the special distributions considered thus far has been 
“justified” by some derivation that is based upon certain concepts found 
in elementary probability theory. Such a motivation for the normal 
distribution is not given at this time; a motivation is presented in Chapter 5, 
However, the normal distribution is one of the more widely used distributions 
in applications of statistical methods. Variables that are often assumed to be 
random variables having normal distributions (with appropriate values of ^ 
and (T) are the diameter of a hole made by a drill press, the score on a test, 
the yield of a grain on a plot of ground, and the length of a newborn child. 

We now prove a very useful theorem. - 


Theorem L If the random variable X is N(fi f a 2 ), a 2 > 0, then the 
random variable W ={X — fx)/a is N(0, 1). 

Proof. The distribution function G(w) of W is, since a > 0 ， 


G(w) = Pr [ ^ _ — < w \ = Fr (X < wa + fi). 


a 


That is, 


G(w) 


-f fi 


(Ty/2n 


exp 


( x ~ f 1 ) 2 

2a 2 


dx. 


If we change the variable of integration by writing 7 = (x — then 


G(hO 


y/2n 


e 


}^}2 


dy. 


Accordingly, the p,d.f. g(w) — G 7 (w) of the continuous-type random 
variable W is 


g(w) = —= e ~^ /2 , — 00 < h ； < 00 . 

y/2n * 

Thus Wis 州 0 , 1 )，which is the desired result (see also Exercise 3,100). 


This fact considerably simplifies the calculations of probabilities 
concerning normally distributed variables, as will be seen presently. 
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z 


FIGURE 3.2 


Suppose that X is N(^ a 2 ). Then, with q < e 2 we have, since 
Pr (X =€,)== 0, 

Pr (c, <X<c 2 ) = Pr(X<c 2 ) - Pr (X < c { ) 




a 


a 


Pr f^<£L^ 


— fi)/c 




€ ^ wlf2 dw 


a 


- ^ 


cr 


00 


— 7 =： e ~ wljl dw 
sfht 


because W =(X — //)/o* is A^(0, 1). That is, probabilities concerning X, 

which is N(fi^ cr 2 ), can be expressed in terms of probabilities concerning 
W, which is N(Q ， 1). 

An integral such as 

/*k 

— e^ wljl dw 
^^/2n 

cannot be evaluated by the fundamental theorem of calculus because 
an “antiderivative” of e ^ 2 is not expressible as an elementary 
function. Instead, tables of the approximate value of this integral for 
various values of k have been prepared and are partially reproduced 
in Table III in Appendix B，We use the notation 


❿ (z) 




—~e ^ w2/2 dw. 
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Moreover, we say that $( 2 ) and its derivative 吖⑻ = 炉⑻ are ， 
respectively, the distribution function and p+d,f, of a standard normal 
distribution N(0, 1). These are depicted in Figure 3.2. 

To summarize, we have shown that if X is N(fL, a 2 ), then 


Pr (c t < X< c 2 ) = Pr 


Jf —上 < c 2 — ii 


a 




a 




u 


(T 




C 2 


a 


0 


q — M 


a 


It is left as an exercise to show that 0( — jc) = 1 一 0(^). 
Example 2. Let X be #(2, 25). Then, by Table III, 


Pr(0 < JT< 10) = Oi 


10 — 2 


01 


0 — 2 


0 ( 1 * 6 ) *— — 0 . 4 ) 

0,945 — (1 — 0,655) = 0.600 


and 


Pr(-8 




8 — 2 


O(_0,2) _<!>(—2) 

(1 — 0.579) - (1 - 0.977) = 0398. 


Example 3. Let X be N{^ a 2 \ Then, by Table III, 


< X < fi + 2a) = 0>| 


fi + 2a — n 
a 


4)1 


— 2(X 一 fj 

a 


<D(2) _ 0( — 2) 


= 0,977 — (1 — 0,977) = 0.954, 

Example 4. Suppose that 10 percent of the probability for a certain 
distribution that is N(fi t a 2 ) is below 60 and that 5 percent is above 90, What 
are the values of ft and a? We are given that the random variable X 
is N(^ a 2 ) and that Pr (Z < 60) = 0.10 and Pr (X < 90) ^ 0.95, Thus 
啊 (60 - fiya] = 0.10 and 中 [(90 - fi)/a] = 0,95, From Table III we have 

60 — fi , T 90 — fi 


a 


1.282- 


u 


L645, 


These conditions require that fi = 73.1 and 疗 =HU approximately. 

Remark. In this chapter we have illustrated three types of parameters 
associated with distributions. The mean ft of N(fi, a 2 ) is called a location 
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parameter because changing its value simply changes the location of the 
middle of the normal p*dX; that is，the graph of the p.d.f. looks exactly the 
same except fora shift in location. The standard deviation a of N{fi y cr 3 ) is called 
a scale parameter because changing its value changes the spread of the 
distribution- That is, a small value of a requires the graph of the normal 
p.d.f. to be tall and narrow，while a large value of a requires it to spread out 
and not be so tall No matter what the values of ft and however，the graph 
of the normal pAJ. will be that familiar “bell shape.” Incidentally, the P of 
the gamma distribution is also a scale parameter* On the other hand, the a 
of the gamma distribution is called a shape parameter^ as changing its value 
modifies the shape of the graph of the p.d.f as can be seen by referring to 
Figure 3」 .The parameters p and fx of the binomial and Poisson distributions, 
respectively, are also shape parameters. 

We close this section with an important theorem. 

* + 


Theorem 2. If the random variable X is N(fi, cr 2 ), a 2 > 0, then the 
random variable V - = (X - is x 2 (l% 

■i 

Proof. Because V = W 2 , where W ={X — ft)/a is A^(0, I), the 
distribution function G(v) of V is, for v>0, 

G(v) = Pr(W 2 <v)^?r (- ^/v < W<^fvl 


That is, 


G(v) 2 




y/2n 


e ~ w2/2 dw^ 0 <v. 


and 


G(v) = 0 ， u < 0. 

If we change the variable of integration by writing w — then 


G ⑻ 




e— yj2 dy ， 0 <v. 


Hence the p.dJ. g(v) — G\v) of the continuous-type random variable 
Fis 


g(v) 


= 0 elsewhere. 

V 

Since g(v) is a pAS. and hence 


v Xj2 ~ x e~ Pl2 , 0 < v < oo 



g(v) dv = I 


it must be that F(^) = ^/n and thus V is 
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EXERCISES 

3.48. If 

0 >( 2 ) = 

show that $( — z) — 1 — O(z). 

3.49. If X is #(75, 100), find Pr (X < 60) and Pr (70 < A" < 100). 

3*50. If X h N{}i, a 2 ), find b so that Pr[—b< {X — y)ja < 办 】 = q ^q 

3.51. Let X be N(fi, a 2 ) so that Pr (X < 89) - 0,90 and Pr (X < 94) = 0.95. 
Find fi and a 2 , 

3.52. Show that the constant c can be selected so that J{x) = c2~ xl 
— oo < x < oo, satisfies the conditions of a normal p.d.t ' 

Hint: Write 2 — ^ ln 2 . 

3.53. U X is N(fi, a 2 ), show that E{\X - fi\) = a^/l/n. 

3.54. Show that the graph of a p,dX N(fi, a 2 ) has points of inflection at 
X — fi — a and x ^ fi + a. • 

3.55. Evaluate g exp [—2(x — 3) 2 ] dx. 

3.56* Determine the ninetieth percentile of the distribution, which is 
iV(65,25). ’ 

3.57. If e 3f + 8/2 is the m.gS. of the random variable JT, find Pr ( — 1 < X <9) r 
3.58* Let the random variable X have the p.d.f. 

2 , 

fix、= 0 < JC <oo，zero elsewhere 

y/2n 

Find the mean and variance of X. 

Compute E(X) directly and EiX 1 ) by comparing that integral with 
the integral representing the variance of a variable that is AT(0, 1)* 

3,59. Let X be N(5, 10). Find Pr [0.04 <(X - 5) 3 < 38.4], 

3.60* If A" is yv(l, 4)，compute the probability Pr (I < X 2 <9), 

3*61. If Xis N(75 t 25), find the conditional probability that X is greater than 
80 relative to the hypothesis that X is greater than 77, See Exercise 2J8 

3*62，Let I be a random variable such that E{X lm ) = (2m)\/(2 m m l ) 

m = 1 ， 2,3, ■ " and EiX 2 - - ] ) = 0, m = 1, 2, 3, ■ ■Find the m 喜 f. and’ 
the p.d-f. of 

3*63* Let the mutually independent random variables X u X 2 , and X 3 be 

7V(0, 1) ， N(2, 4), and I, 1), respectively. Compute the probability that 
exactly two of these three variables are less than zero ， 




— GO \Jh 


e- w 22 dw. 
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3*64* Compute the measures of skewness and kurtosis of a distribution which 
is N(^ a 2 ). 

3.65，Let the random variable X have a distribution that is N(ju' a 7 ) + 

(a) Does the random variable Y ^ X 2 also have a normal distribution? 

(b) Would the random variable Y = aX + a and b nonzero constants, 
have a normal distribution? 

Hint; In each case，first determine Pr (Y<y), 

3.66, Let the random variable be a 2 ). What would this distribution be 
if cr 2 = 0? 

Hint: Look at the m>g.f 4 of X for a 2 > 0 and investigate its limit as 
(r 2 -0. 

3.67* Let (p(x) and 0(x) be the p.d.f* and distribution function of a standard 
normal distribution. Let Y have a truncated distribution with p.d.f, 

S(y) — V(y)/l^0) — 少 ⑷] ， a < y < b, zero elsewhere. Show that E{Y) is 
equal to [<p(a) - <p(b)]/[<b(b) — 少 ⑻]. 

3*68, Let J{x) and be the p.d,f. and the distribution function of a 
distribution of the continuous type such that f(x) exists for all x. Let the 
mean of the truncated distribution that has p.d.f g(y) =J{y)/F(bl 
-co <y<b ，zero elsewhere, be equal to ^6)/^(*) for alJ real b. Prove 
that/(j：) is a p.d,f, of a standard normal distribution. 

3,69, Let A" and Ybc independent random variables, each with a distribution 
that is 7V(0 ， 1)，Let Z — X + K Find the integral that represents the 
distribution function G{z) — Pr (A" + Y < z) of Z, Determine the p.d*f. 
of Z. 

Hint: We have that G(z) — H(x, z) dx, where 

i) = j 2 ^ exp [ — (x 2 + y )/2] dy. 

Find G\z) by evaluating J 二 _jc ， z)fdz] dx. 

3.5 The Bivariate Normal Distribution 

Remark* If the reader with an adequate background in matrix algebra so 
chooses，this section can be omitted at this point and Section 4J0 can be 
considered later. If this decision is made, only an example in Section 4.7 and 
a few exercises need be skipped because the bivariate normal distribution 
would not be known. Many statisticians, however, find it easier to remember 
the multivariate (including the bivariate) normal p*d*f, and using 

matrix notation that is used in Section 4.10. Moreover, that section provides 
an excellent example of a transformation (in particular, an orthogonal one) 
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A 

and a good illustration of the moment-generating function technique; these 
are two of the major concepts introduced in Chapter 4, 

Let us investigate the function 


y )= 



- p 2 



— co<x< OO, — 00 < < 00 , 


where, with <j, > 0, <r 2 > 0, and — i < p < 1 ， 



At this point we do not know that the constants }i 2 , a] 7 and p 
are those respective parameters of a distribution. As a matter of fact ， 

we do not know that J{x, y) has the properties of a joint p,d.[ It will 
be shown that: 


1- J{x, y) is a joint p.d.f. 

2, X is and Y is gI), 

3. p is the correlation coefficient of X and F, 

A joint p.dX of this form is called a bivariate normal pAf” and the 
random variables X End Y are said to have a bivariate fiotntal 
distribution. 

That the noimegative functional ， j) is actually a joint p.dX can 
be seen as follows. Define /,(x) by 

f*oo 

fi ( x ) == y) dy. 

J - CO 

Now 



where b = fi 2 + p(a 2 /a { )(x — ",)■ Thus 

f(x)~- cxp - fh ) 2 / 2g ii 「 ex P izii ~,^)7[2<^(i - p 2 )]} 

1 一 ^ L~ 一 ~~ ' 办 ‘ 

¥ - 番 M 4 

For the purpose of integration, the integrand of the integral in this 
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expression for/j (x) maybe considered a normal p.df. with mean 6 and 
variance 4(1 一 〆)• Thus this integral is equal to 1 and 




(Tl 


I ! 

^ (x _ wi 

- — exp 

诉％ \ 

2a\ 


— 00 < X < 00. 


Since 


J{x, y) dy dx 


fi(x) dx 


the nonnegative function j{x 9 y) is a joint p.dX of two continuous- 
type random variables X and K Accordingly, the function/j (jc) is 
(: he marginal p.d.f, of X y and X is seen to be N(fi u a 2 } y In like 


manner, we see that K is N(ji 2y €). 

Moreover, from the development above, we note that 


y) =f\(x) 


( 1 

^ (y - W 1\ 

Uyr^^ exp 

2<xi(\~p 2 )J 

■tan mr 


where b = ii 2 + p((r 2 /ai)(x — "!)• Accordingly, the second factor in the 
right-hand member of the equation above is the conditional p*dX of 
K given that X ^ x\ That is, the conditional p.d.f. of K，given X = x, 
is itself normal with mean + — fh) and variance 

— p 2 )- Thus，with a bivariate normal distribution, the conditional 
mean of Y, given that X = jc, is linear in x and is given by 


E(Y\x) = fi 2 + p~ (x - fi } ) 




Since the coefficient of x in this linear conditional mean E{ Y\x) 
is pa 2 jo u and since and a 2 represent the respective standard 
deviations, the number p is, in fact, the correlation coefficient of X and 
K This follows from the result, established in Section 2.3, that the 
coefficient of x in a general linear conditional mean £(y|jc) is the 
product of the correlation coefficient and the ratio <J 2 jo x , 

Although the mean of the conditional distribution of Y, given 
I = x, depends upon x (unless p = 0)，the variance — p 2 ) is the 
same for all real values of x ， Thus, by way of example, given that x, 

the conditional probability that Y is within (2.576)a 2y /l — p 2 units of 
the conditional mean is 0,99, whatever the value of x may be. In this 
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sense, most of the probability for the distribution of X and Y lies in 
the band 

抡 + P$(x — 妁） 土 (2J76> 2V /T:p 2 

about the graph of the linear conditional mean. For every fixed 
positive <r 2? the width of this band depends upon p. Because the band 
is narrow when p 2 is nearly 1, we see that p does measure the intensity 
of the concentration of the probability for X and Y about the linear 
conditional mean. This is the fact to which we alluded in the remark 
of Section 23. 

In a similar manner we can show that the conditional distribution 
of X given Y = is the normal distribution 

一 _ — I 

n 4(i — p 2 ) • 


Example Let us assume that in a certain population of married 
couples the height of the husband and the height X 2 of the wife have a 
bivariate normal distribution with parameters fi { — 5.8 feet, fi 2 — S3 feet 3 
cri — = 0.2 foot, and p = 0.6* The conditional p,d.f. of X 2y given X x - 63, 

is normal with mean 5.3 + (0.6)(6.3 — 5.8) — 5.6 and standard deviation 
(0.2)^/(I — 036) — 0,16, Accordingly, given that the height of the husband 
is 6.3 feet, the probability that his wife has a height between 5,28 and 5.92 
feet is 

Pr(5,28 <X 2 < 5S21X, - 6.3) = 0>(2) ^0(-2) = 0.954. 

The interval (5.28, 5,92) could be thought of as a 95.4 percent prediction 
interval for the wife's height，given X\ = 6.3, 


The m.g.f, of a bivariate normal distribution can be determined as 
follows. We have 




/*co 


^* x + /2J /(jc ， j) dxdy 




e^ x fi(x) 


e hy f 2 \\{y\x) dy 


dx 


for all real values of r, and t 2 ^ The integral within the brackets is the 


m.g.f, of the conditional Since f 2 \\(y\x) is a normal p.df, 

with mean fi 2 + p{a 2 la x ){x — /i t ) and variance al(l — p 2 )，then 


^00 


^ hy f 2 \\{y\^) dy = exp i t 2 


0^2 

^2 + p-{^- t^\) 


+ 
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Accordingly ， M{t u t 2 ) can be written in the form 


exp < t 2 fi 2 


❿ •触 1 一 〆 ) 


exp 


+ hP 


戊 I 



f\(x) dx. 


But E(e tX ) = exp [/i,/ + (cr^t 2 )/!] for all real values of /. Accordingly, 
if we set t — t } + t 2 p(o 2 /a ) X we see that M(t [ , t 2 ) is given by 


exp 





+ hP 


^2 
( 7 , 


U + hP 


+ 



or 5 equivalently, 


^Uut 2 )^ exp ( + ti 2 t 2 + 


郝 + 2p<r 1 cr 2 /,r 2 + 


It is interesting to note that if, in this / 2 )» the correlation 

coefficient p is set equal to zero, then 

^{t\^ h) = 0)M(0, t 2 ). 

*' * - *«• ' . ' * ■ •- • … 

Thus X and Y are independent when /? = 0, If, conversely, 

M(r l? / 2 )-A/(/ l5 0)M(0, t 2 \ 

we have e p<T}<T2t ^ 2 ^ L Since each of q and <r 2 is positive, then p — 0. 
Accordingly，we have the following theorem* 

Theorem 3* Let X and Y have a bivariate normal distribution with 
means fx' and positive variances g] andaj, and correlation coefficient 
P、Then X and Y are independent if and only if p = 0. 

Asa matter of fact，if any two random variables are independent 
and have positive standard deviations, we have noted in Example 4 of 
Section 2A that p = 0. However, p = 0 does not in general imply that 
two variables are independent; this can be seen in Exercises 2-20 (c) and 
2.25* The importance of Theorem 3 lies in the fact that we now know 
when and only when two random variables that have a bivariate 
normal distribution are independent* 
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EXERCISES 

- I ^ 

3.70, Let X and Y have a bivariate normal distribution with respective 
parameters \x x = 2 . 8 5 \i Y = ! 10 , a\ = 0,16， ^ = 100 ， and p = 0 . 6 . 
Compute: 

(a) Pr(106< Y< 124). 

(b) Pr (I06< Y< 124pr- 3J), 

■* fcir r 

1 ■ 

3*71, L^t X and Y have a bivariate normal distributioii with pararneters 

内 = 3 ， fi 2 = U ^ — 16 ? al — 25, and p = Determine the following 
probabilities; 

(a) Pr (3 < K < 8 ). 

(b) Pr (3 < Y<$\X= 7). 

(c) Pr(-3<X<3). ^ 

(d) Pr(-3 < X<3tr- 一 4 ). 

3.72，If M{t u t 2 ) is the of a bivariate normal distribution, compute the 

covariance by using the formula 

沪 M(0, 0) dM(0, 0) dM(0 7 0) 

dt\dt 2 dt x dt 2 

Now iet \lf{t u i 2 ) ^ In M(t u t 2 ). Show that d 2 \f/(0,0)/dt t dt 2 gives this 
covariance directly. 

3*73，Let X and Y have a bivariate normal distribution with parameters 

//■ = 5, fi 2 ^ 10, l, 4 = 25, and p > 0, If Pr (4 < Y< 16\X^ 5)= 
0*954, determine p, 

■ ■* ■* 

3*74« Let X and Y have a bivariate normal distribution with parameters 
~ 20, /^2 = 40, u\ — 9^ c\ = 4, and p = 0.6. Find the shortest interval for 
which 0-90 is the conditional probability that yisin this interval, given that 
X=22. 

m Say the correlation coefficient between the heights of husbands and 
wives is 0.70 and the mean male height is 5 feet 10 inches with standard 
deviation 2 inches, and the mean female height is 5 feet 4 inches with 
standard deviation 1 ^ inches. Assuming a bivariate normal distribution, 
what is the best guess of the height of a woman whose husband’s height is 
6 feet? Find a 95 percent prediction interval for her height, 

3.76 - Let 

y) = (I/ 2 tt) exp 卜 Kx 2 + /)]{! +xy cxp[ + / 一 2 )】}， 

where — oo < x < oo, — oo </ < oo. is a joint p.dX，it is not a 

normal bivariate p.dX Show that^, 3 ；) actually is a joint p.dX and that 
each marginal p.d f_ is normal Thus the fact that each marginal p.dX is 
normal does not imply that the joint p,d.L is bivariate normal. 
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3.77. Let X, Y f and Z have the joint p.d.f. 



3/2 


exp 


x 2 + y 2 + z T 


2 


+ xyz exp 


x 2 + 少 2 + 2 2 、 
2 




where 一 oo < x < oo, 一 oo < y < oo, and — oo < z < oo* While JT 5 F, and 
Z are obviously dependent, show that Y, and Z are pairwise independent 
and that each pair has a bivariate normal distribution. 


3.78* Let X and Y have a bivariate normal distribution with parameters 
ft] ^ ^ — — a\^ 1， and correlation coefficient p. Find the distribution 

of the random variable Z = aX + bY in which a and b are nonzero 
constants. 

Hint: Write G{z) — Pr (Z < z) as an iterated integral and compute 
G'{z) = g(z) by differentiating under the first integral sign and then 
evaluating the resulting integral by completing the square in the exponent, 

ADDITIONAL EXERCISES 

3.79. Let X have a binomial distribution with parameters n — 288 and 

Use Chebyshev’s inequality to determine a lower bound for 

r Pr(76 < X< 116). 

3.80, Let f{x) = , x = 0, 1 ， 2, . • ， ， zero elsewhere. Find the values 

■ of /i so that 文 =1 is the unique mode; that is, J{0) </(I) and 
刃）>瓜）>烟 > …. 


3.81. Let X and Y be two independent binamiai variables with parameters 
/I = 4 S /? = I and w = 3 , /? = |, respectively. Determine Pr {X — r= 3 ). 

3*82* Let X and Y be two independent binomial variables, both with 
parameters n aud p — Show that , 


Pr (X- r=o> 


(2n)\ 


nl n\ (2^) 


3.83. Two people toss a com five independent times each. Find the proba¬ 
bility that they will obtain the same number of heads. 


3.84, Color blindness appears in 1 percent of the people in a certain 
population. How large must a sample with replacement be if the proba- 
bility of its containing at least one color-biind person is to be at least 0.95? 
Assume a binomial distribution b(n, p - 0.01) and find n. 

3.85, Assume that the number X of hours of sunshine per day in a certain 
place has a chi-square distribution with 10 degrees of freedom. The profit 
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of a certain outdoor activity depends upon the number of hours of sun¬ 
shine through the formula 

profit = 1000(1 — e~ Xfl0 y 
Find the expected level of the profit. j 

3-86. Place five similar balls (each either red or blue) in a bowl at random as 
follows: A coin is flipped 5 independent times and a red ball is placed in the 
bowl for each head and a blue ball for each tail. The bowl is then taken and 
two balls are selected at random without replacement. Given that each of 
those two balls is red，compute the conditional probability that 5 red balls 
were placed in the bowl at random* 

3.87, Ifadieis rolled four independent times, what is the probability of one 
four，two fives，and one six, given that at least one six is produced? 

_ 4 

3.88. Let the p.dX f(x) be positive on s and only on, the integers 
0, 1 ， 2, 3, 4, 5, 6, 7, 8, 9, 10, so that f(x) = [(11 — x)jx) f(x — x = I, 2 ? 
3,… ， 10, Find f{x), 

j 

3*89. Let A" and Y have a bivariate normal distribution with 叫 = 5, = 10, 

= ij oi = 25,, and p = Compute Pr (7 < Y< \9\x = 5). 

I •• • * ' 

3*90. Say that Jim has three cents and that Bill has seven cents, A coin is 
tossed ten independent times. For each head that appears. Bill pays Jim 
two cents, and for each tail that appears, Jim pays Bill one cent. What 
is the probability that neither person is in debt after the ten trials? 

3.91* If E(X r ) = [(r + l)!](2 r )，r = 1 ， 2, 3, …， find the m.gX and p.dX 
of X 

3-92* For a biased coin，say that the probability of exactly two heads in three 
independent tosses is What is the probability of exactly six heads in nine 
independent tosses of this coin? 

3*93. It is discovered that 75 percent of the pages of a certain book contain 
no errors. If we assume that the number of errors per page follows a Poisson 
distribution, find the percentage of pages that have exactly one error. 

3*94* Let Xhavea Poisson distribution with double mode atx — I and x — 2. 
Find Pr [X — 0]. 

3*95, Let X and Y be jointly normally distributed with fi x = 20, fi r = 40, 
a x = 3, a r — 2, p = 0*6. Find a symmetric interval about the conditional 
mean，so that the probability is 0,90 that Y lies in that interval given that 
X equals 25. 
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3.96. Let J{x) — ^ — p) ,0_x , x = 0,1,..., 10, zero elsewhere. Find 

the values of p t so that f(0) >J(l) > * — > J{ 10). 

3*97* Let J{x r y) be a bivariate normal p*dX and let c be a positive constant 

so that c < (2 跡 -<j 2<v /1 二 〆 广 1 . Show that c =J{x^y) defines an ellipse in 
the xy-plmc. 

3.98. Let f t (jc, y) and / 2 (x, y) be two bivariate normal probability density 

functions, each having means equal to zero and variances equal to I. 

The respective correlation coefficients are p and —p. Consider the joint 

distribution of X and Y defined by the joint p.dX [f, (x f y) + f 2 (x, y)]/2. 

Show that the two marginal distributions are both A^(0, I), X and Y are 

dependent, and E{XY) — 0 and hence the correlation coefficient of X and 
Y is zero. 

3*99* Let X be N(^ a 2 ). Define the random variable Y ^ e x and find its pAJ, 
by differentiating G(y) = Pr (e x <j) = Pr (X<\ny). This is the p.dXofa 
lognormal distribution, 

3.100* In the proof of Theorem 1 of Section 3-4, we could let 

G(w) = Pr (JT ^ wa + /i) = F(w<r + fi), 

where F and F are the distribution function and p.d,f, of X\ 
respectively. Then, by the chain rule, 

g(^) — G f (w) — [F\wa + ")Jcr, 

Show that the right-hand member is the p,d_f. of a standard normal 
distribution; thus this provides another proof of Theorem 1* 



CHAPTER 


Distributions 
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of Random 
Variables 


4.1 Sampling Theory 

Let X U X 2 ^ … 、 X n denote n random variables that have the joint 
p.d.f. f{x x } x 2j …， These variables may or may not be 
independent* Problems such as the following are very interesting in 
themselves; but more important, their solutions often provide the basis 
for making statistical inferences. Let F be a random variable that is 
defined by a function of X 2 ,, X n , say Y = u{X u X 2 , •,■ ， X n ). 
Once the pAS.J{x u x 2 , * - - , x n ) is given, can we find the pAS. of Tl 
In some of the preceding chapters, we have solved a few of these 
problems- Among them arelhe following two. If n = 1 and if X x is 
iV(/4, a 2 ), then Y^(X { — fi)/<j is iV(0, 1). Let /ibea positive integer and 
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let the random variables X h i — 1, 2, ..•，《，be independent, each 
having the same p.dX J{x) = p x (l ~ p) 1 ^ \ x = 0, 1， and zero else - 

n 

where. If Y = Y,X h then Y is b(n,p). It should be observed that 

i 

K = «( 不） =( 尤 — 只 )/<j is a function of that depends upon 
the two parameters of the normal distribution; whereas Y= 

n 

A ， •… ，尤 ） =X X does not depend upon /?, the parameter of 

i 

the common of the X h / = 1 ， 2, The distinction that 

we make between these functions is brought out in the following 
definition. 

Definition 1. A function of one or more random variables that does 
not depend upon any unknown parameter is called a statistic. 

In accordance with this definition, the random variable 

discussed above is a statistic. But the random variable Y = (JT, - ft)/(r 
is not a statistic unless fi and a are known numbers- It should be noted 
that，although a statistic does not depend upon any unknown 
parameter，the distribution of the statistic may very well depend upon 
unknown parameters. 

Remark, We remark, for the benefit of the more advanced reader, that a 
$tatisUcis usually defined to be a measurable function of the random variables. 
In this book ， however，we wish to minimize the use of measure theoretic 
.terminology, so we have^up^ressed thejnodifier “measurable. ，， ft is quite 
clear that a statistic is a random variable. In fact, some probabiiists avoid the 
use of the word “statistic” altogether, and they refer to a measurable function 
of random variables as a random variable. We decided to use the word 

“statistic” because the reader will ^encounter it so frequently in books and 
journals, 

% 

We can motivate the study of the distribution of a statistic in the 
following way. Let a random variable X be defined on a sample space 
贫 and let the space of X be denoted by si. In many situations 

tiie distribution of X is not completely known. For in¬ 
stance, we may know the distribution except for the value of an 
unknown parameter. To obtain more information about this distri¬ 
bution (or the unknown parameter), we shall repeat under identical 
conditions the random experiment n independent times. Let the 
random variable X t be a function of the rth outcome, / = 1 ， 2, • •. ， 
Then we call - * •，尤 the observations of a random sample 
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from the distribution under consideration. Suppose that we can define 
a statistic Y—u(X u X 29 …， X n ) whose p.d_f* is found to be g(y). 
Perhaps this p.d.f. shows that there is a great probability that Y has 
a value close to the unknown parameter* Once the experiment has been 
repeated in the manner indicated and we have X } —x u . *,, X n ~x n , 
then y = u{x u x 2 ,. ^ ^x n ) is a known number. It is to be hoped that 
this known number can in some manner be used to elicit information 
about the unknown parameter. Thus a statistic may prove to be useful. 

Remarks, Let the random variable X be defined as the diameter of a hole 
to be drilled by a certain drill press and let it be assumed that X has a normal 
distribution. Past experience with many drill presses makes this a^sum^tion 
plausible; but the assumption does not specify the mean ft nor the variance 
o 2 of this normal distribution. The only way to obtain information about # 
and a 2 is to have recourse to experimentation. Thus we shall drill a number, 
say 打 = 20, of these holes whose diameters will be X 2i … f Then 
X u X 2 , …， X 2 a is a. random sample from the normal distribution under 
consideration. Once the holes have been drilled and the diameters measured, 
the 20 numbers may be used, as will be seen later, to elicit information about 
fi and a 2 . 

* 

The term "random sample” is now defined in a more formal 
manner, 

Defiiution 2. Let X u 馬， ，" ， 尤 denote n independent random 
variables, each of which has the same but possibly unknown 
pAS.J{x); that is, the probability density functions of X v ^ X 2 ^ X n 
are, respectively, f x {x % )=f{x x \ / 2 ( 々 ) =/^ 2 )，•…， so 

1 " —5 * 

that the joint p*dX is Jixi]f(x 2 ) * - 9 f{x n ). The random variables 
Jfi ， A ，...， 尤 are then said to constitute a random sample from 
a distribution that has p d.[ f(x). That is，the observations of a 
random sample are independent and identically distributed (often 
abbreviated i 丄 d). 

Later we shall define what we mean by a random sample from a 
distribution of more than one random variable. 

Sometimes it is convenient to refer to a random sample of size 
n from a given distribution and，as has been remarked，to refer 
to X u X 27 … ，，尤 as the observations of the random sample, A 
reexamination of Example 2 of Section 2.5 reveals that we found the 
p-d*f. of the statistic，which is the maximum of the observations 
of a random sample of size from a distribution with p:dX 
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f(x) = 2x, 0 <x< 1， zero elsewhere. In Section 3-1 we found the 
pAS. of the statistic, which is the sum of the observations of a random 
sample of sizew from a distribution that has p.dX f(x) = p x (t — p) ] 
x = 0 ? 1, zero elsewhere. This fact was also referred to at the beginning 

of this section. … 

a »* •‘ 

In this book, most of the statistics that we shall encounter will 
be functions of the observations of a random sample from a given 
distribution. Next, we define two important statistics of this type* 

* ^ 

Definition 3* Let X U X 2 , -. - denote a random sample of size n 
from a given distribution. The statistic 

ri 

^ Xy + X 2 + ^ ^ Xi 

H /= 1 ^ 

is called the mean of the random sample, and the statistic 

n 

is called the variance of the random sample. 

普 戶 * _ . . 

* * ■ •* 

•• 讎 —p * 

Remarks. Many writers do not define the variance of a random sample 

n . — 

as we have done but, instead, they take S' 2 — £ (X f — Xfj{n — I ), There are 

good reasons for doing this. But a certain price has to be paid, as we shall 
indicate. Let jc 【， x 2 , _ … denote experimental values of the random variable 
X that has the p,d.f./(x) and the distribution function F(x), Thus we may look 
upon x { \ x 2 ,..., x„ as the experimental values of a random sample of size n 
from the given distribution，The distribution of the sample is then defined to 
be the distribution obtained by assigning a probability of \jn to each of 
the points x n x 2 ^ ^ , x„. This is a distribution of the discrete type* The 
corresponding distribution function will be denoted by F„(jc) and it is a step 
function. If we let f x denote the number of sample values that are less than 
or equal to x, then F n (x) — f x /n, so that F„(x) gives the relative frequency of 
the event X < x in the set of n observations. The function F n {x) is often called 
the “empirical distribution function” and it has a number of uses. 

Because the distribution of the sample is a discrete distribution, the mean 

and the variance have been defined and are, respectively, ^ xjn = x and 

X (Xj — xfjn = s 2 . Thus, if one finds the distribution of the sample and the 

II ■■ * 4 ■ • , ； - ^ • • : \ m ~ ^ * * - 《 *■ v • • 

associated empirical distribution function to be useful concepts, it would 


.. n 

s 2 ^ I 


I 




n 
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seem logically inconsistent to define the variance of a random sample in any 
way other than we have- _ 

We have also defined X and S 2 only for observations that are i.i,d” that 
is, when X u X 2y denote a random sample. However, statisticians often 

use these symbols, X and 5 2 ，even if the assumption of independence is 
dropped* For example, suppose that U 2 ,. - ,, X n were the observations 
taken at random from a finite collection of numbers without replacement. 

* ■ hBW' 

These observations could be thought of as a sample and its mean X and 
variance S 1 computed; yet A% Z 2 ,,,,, X n are dependent. Moreover, the n 
observations could simply be some values，not necessarily taken from a 
distribution, and we could compute the mean X and the variance S 2 associated 
with these n values* If we do these things, however, we must recognize the 
conditions under which the observations were obtained, and we cannot make 
the same statements that are associated with the mean and the variance of 
what we call a random sample. 

Random sampling distribution theory means the general problem 
of finding distributions of functions of the observations of a random 
sample. Up to this point, the only method，other than direct prob¬ 
abilistic arguments, of finding the distribution of a function of one 
or more random variables is the distribution function technique. 
That is, if X u X 2 , ^ ^ are random variables, the distribution of 

7= u(X u X 2 , _ ， X n ) is determined by computing the distribution 

function of 

G(y) = Vr[u(X u X 2 ,. - . s X) <yl :、 

■a 

I . 

Even in what superficially appears to be a very simple problem, this 
can be quite tedious. This fact is illustrated in the next paragraph. 

Let X u X 2 , X 3 denote a random sample of size 3 from a standard 
normal distribution. Let Y denote the statistic that is the sum of 
the squares of the sample observations. The distribution function 

of Fis • 


G(y)^Pr(X 2 { + X 2 2 ^Xl<y). 
Ify < 0, then G(y) = 0, However, if y>0, then 


G ， J]fe exp 

A 

where A is the set of points (jc (> x 2 ^ at 3 ) interior to, or on the surface of ， 
a sphere with center at (0,0, 0) and radius equal to This is 


2 


(x 2 { +4 + 4 ) 


■ ^ ■»> 

dx } dx 2 dxu 
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not a simple integral. We might hope to make progress by changing 
to spherical coordinates: 


x l ^ p cos 0 sin <p, x 2 - p sin 0 sin <p, x 3 — p cos (p, 
where p > 0, 0 < (9 < 2 ti, 0 < (p < ^ Then, for y>0. 




Jq 



^0 


(2n) m 


e ~ p 2 / 2 p 2 sin q> d<p d9 dp 



If we change the variable of integration by setting p ^ we have 


G(y) 



ry 




e 


wl2 


dw. 




for y > 0* Since F is a random variable of the continuous type the 
p,d.f. of Us g(y) = G\y). Thus 


g(y) 


产 1 〆 ， 0<j<oo, 


s/2n 

0 elsewhere 


Because r(|) = (1)1(1) - (1)^, and thus ^ r(|)2^, we see that 
r is x 2 Q\ ■ 

fhe problem that we have just solved highlights the desirability of 
having, if possible, various methods of determining the distribution of 
a function of random variables. We shall find that other techniques are 
available and that often a particular technique is vastly superior to the 

others in a given situation* These techniques will be discussed in 
subsequent sections. 

Example l m Let the random variable Y be distributed uniformly over the 
unit interval 0 < y < l; that is，the distribution function of Y is 

^(y) = 0, y <o, 

= y, 0 < y < I, 

= 1 ， i 

Suppose that F(x) is a distribution function of the continuous type which is 
strictly increasing when 0 < f\x) < 1. [f we define the random variable X 
by the relationship F{X), we now show that JIT has a distribution 
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which corresponds to F(x), If 0 < F(x) < U the inequalities X<x and 
^(^0 < f(x) are equivalent. Thus，with 0 < F(x) < 1, the distribution of JT is 

Pt(X<x) = Pt [F(X)< F(x)] = Pr[y< F(x)] 

because Y = However, Pr(Y<y) — G(y\ so we have 

?r(X^x)^ 6{F(x)} = f{x), 0<F(x)< L 

That is，the distribution function of X is F{x). 

.霹 " 

This result permits us to simulate random variables of different 
types. This is done by simply determining values of the uniform 
variable r，usually with a computer. Then, after determining the 
observed value Y = y, solve the equation y = F{x\ either explicitly or 
by numerical methods. This yields the inverse function x = F~ l (y). By 
the preceding result, this number x will be an observed value of JT that 
has distribution function F(x). 

It is also interesting to note that the converse of this result is true. 
If X has distribution function F(x) of the continuous type, then 
Y ^ F(X) is uniformly distributed over 0 < 少 < L The reason for this 
is，for 0 < j; < 1， that 

Pr (Y<y) = Pr {F{X) <y] = ^r[X< F^ ] {y)l 

However, it is given that Pr (X < x) — F(x\ so 

Pr(F^y) ^ F[f- l (y)] ^ y, 0 <y<\. 

This is the distribution function of a random variable that is distri¬ 
buted uniformly on the interval (0, I), 

EXERCISES 

♦ 

4 . 1 . Show that 

一 n 

where ^ XJn. 

* 

k 

4 , 2 * Find the probability that exactly four observations of a random 
sample of size 5 from the distribution having p.d,r f(x) — (jc + !)/2 3 
—1 < x < zero elsewhere, exceed zero- 

4,3. Let X u X 2j be a random sample of size 3 from a distribution that 
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is A^( 6 , 4), Determine the probability that the largest sample observation 
exceeds 8 . 

4-4. What is the probability that at least one observation of a random 

sample of size n — 5 from a continuous-type distribution exceeds the 
90th percentile? 

4,5. Let X have the p.dX/(x) = 4x^ T 0 < x < 1， zero elsewhere. Show that 
Y= _21n JT*is 抑， 

4*6* Let , Xi be a random sample of size /j = 2 from a distribution with 

p.d-f./(x) = 4x 3 ,0 < x < I, 2 ero elsewhere. Find the mean and the variance 
of the ratio . 

Hint: First find the distribution function Pr( Y < y) when 0 < j; < 1 and 
then when 1 < y 9 

暑 

4,7, Let Xi ， be a random sample from the distribution having 

p,d,f, f(x) = 2x, 0 <x <], zero elsewhere. Find Pr (XJX 2 < and 
Pr {X\X 2 > |)- 

I 

4.8* If the sample size is a = 2, find the constant c so that S 2 = c(Z, - X 2 f. 

4.9* If xf i ，/ = 1 ， 2, ， .. ， a, compute the values of Jc = 2 x“n and 
s 1 (x t — x) 2 jn w 

4.1JK Let a + bx h i = 1 ， 2, “ • ， /a, where a and b are constants* Find 

and = £ (y t . - yfjn in terms of a 、 b, x = £ xjn, and 

4J1. Let X x and X 2 denote two U.d* random variables, each from a 
distribution that is JV(0, 1). Find the p.dX of Y^X 2 { + X\, 

Hint: In the double integral representing Pr (7 < y), use polar 
coordinates. 

4.12. The four values ^ = 0,42, y 2 = 0.31 ， ^ = 0.87, and y 4 = 0,65 represent 
the observed values of a random sample of size tt = 4 from the uniform 
distribution over 0 < 7 < J. Using these four values, find a corresponding 
observed random sample from a distribution that has p.d.f J{x) = e~\ 

0 < x < co, zero elsewhere. 

4*13. LetJf H X 2 denote a random sample of si 沈 2 from a distribution with 
p‘d.f. /(jt) = 0 < jt < 2, zero elsewhere. Find the joint p,d*f. of X { and AV 

Let Y = X { + X 2 . Find the distribution function and the p + d.f of F, 

4*14, Let denote a random sample of size 2 from a distribution with 

pAS. f(x) ^ 1 , 0 < x < h zero elsewhere. Find the distribution function 
and the p,dX of Y= X l /X 2 . 

415* Let be three i.i.d. random variables, each from a distri- 

bution having p.d,f,7U) = 5?, 0 < x< U zero elsewhere. Let Y be the 
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largest observation in the sample. Find the distribution function and p.d.f, 
, of y, • 

4-16, Let X l and X 2 be observations of a random sample from a distribution 
with p.dS.f(x) = 2x^ 0 < x < I T zero elsewhere. Evaluate the conditional 
probability Pr (A", < X 2 \X { < 2X 2 ). r 


4*2 Transformations of Variables of the Discrete Type 

：■< ， p p 

•, • $ * % I * ■■ 

An alternative method of finding the distribution of a function 
of one or more random variables is called the change-of-variable 
technique. There are some delicate questions (with particular reference 
to random variables of the continuous type) involved in this technique, 
and these make it desirable for us first to consider special cases. 

Let X have the Poisson p.dX 


fix) 


pre 




xl ， 


x = 0 ， 1 ， 2, -- 




0 elsewhere 


As we have done before, let denote the space = {x\ x = 
0, 1 ， 2, ■ … }，so that is the set where f(x) > 0* Define a new 
random variable Y by Y = AX. We wish to find the pAS. of Y by 
the change-of-variable technique. Let y = 4x. We call y — 4x ?l 
transformation from x to y, and we say that the transformation maps 
the space ^ onto the space M {y: j = 0, 4, 8, 12, The space m 
is obtained by transforming each point in in accordance with_y = 4x. 
We note two things about this transformation* It is such that to each 
point in si there corresponds one，and only one ? point in 0; and 
conversely, to each point in ^ there corresponds one，and only one ， 
point m . That is, the transformation j = 4x sets up a one-to-one 
correspondence between the points of ^ and those of Any function 
y = u{x) (not merely y — 4jc) that maps a space si (not merely ouri) 
onto a space 潘 (not merely our 激 ） such that there is a one-to-one 
correspondence between the points of and those of 3S is called a 
one-to-one transformation. It is important to note that a one-to-one 
transformation, y = u{x% implies that x is a single-valued function of 
y. In our case this is obviously true，since y ^ 4x requires that x = (^)y. 

Our problem is that of finding the p.d.f. g(y) of the discrete type 
of random variable Y — 4X Now g{y) — Pr (F = /), Because there is 
a one-to-one correspondence between the points of ^ and those of 
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激 ， the event Y = y or 4Z = y can occur when，and only when, the event 

^ i\)y occurs. That is, the two events are equivalent and have the 
same probability. Hence 


g{y) ^ Pr (7 = y) ^ Pr 




少 = 0, 4, 8,. • • ， 


= 0 elsewhere. 

The foregoing detailed discussion should make the subsequent text 
easier to read. Let X be a random variable of the discrete type，having 
p.d.f, /(x). Let / denote the set of discrete points, at each of which 
fix) >0, and let^ = u(x) define a one-to-one transformation that maps 
^ onto If we solve y — u(x) for x in terms of 少， say, x = w(yX then 
for each y € 劣 ， we have x — w(y) e Consider the random variable 
^ = w(^}. If then x = w{y) e ja/, and the events Y — y [or 

u{X) = y] and X = w(y) are equivalent* Accordingly, the p.df. of Y is 

r i 

s(y) - Pr (Y^y) = Pr[Z= w(y)] = f[w(y)l j e 激，， 


— 0 elsewhere. 

Example L Let X have the binomial p*d.[ 

Ax) " ^"(3 - xy. (f) (!) ， n 0,1 ， 2, 3, 


* = 0 elsewhere. 

We seek the p.d.f. of the random variable Y = X 2 , The transformation 
y = f maps = 1,2, 3) onto ^ ： j ； = 0,1,4, 9}. In 

general, y ^ x 2 does not define a one-to-one transformation; here, however, it 
does，for there are no negative values of jc in j = {jc : x = 0, ! ， 2, 3}. That is, 
we have the single-valued inverse function jc = w(y) = ^(not —yjy), and 


g(y) =^f(y/y )= 


3! 

(V?)! (3 - V?)! 



y = 0, 1,4, 9, 


= 0 elsewhere* 

There are no essential difficulties involved in a problem like 
the following. Let f{x u a: 2 ) be the joint p.d.f. of two discrete-type 
random variables X x and X 2 with s0 the (two-dimensional) set of 
poi ⑽ at which fix ,, ^ 2 ) > 0, Let y t - u x (x,, jc 2 ) and y 2 - u 2 (x u x 2 ) 
define a one-to-one transformation that maps ^ onto W. The joint 
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p.dX of the two new random variables Y x ^ U\(X ly X 2 ) and 7 2 - 
u 2 (X u X 2 ) is given by 

& 

giyuyi) ^ Jl^xiyuyi% W t {y u y 2 )l (j,, y 2 ) e 


= 0 elsewhere, 

where = ^\{yuy^ x 2 = w 2 (y^y 2 ) is the single-valued inverse of 
少！ = u i( x i> x iX yi = u 2 (x u x 2 )^ From this joint p,dX g{y x ^ ^ 2 ) we may 
obtain the marginal pAJ, of Y { by summing on y 2 or the marginal 
p,d,f. of Y 2 by summing on y l . 

Perhaps it should be emphasized that the technique of change 
of variables involves the introduction of as many “new” variables 
as there were “old” variables. That is, suppose that f(x u x 2 , x 3 ) is the 
joint p,df of X u X 2i and with the set where x 2 , x 3 ) > 0, 
Let us say wc seek the p.dl. of Y { ^ u x {X u X 2y X 3 ), We would then 
define (if possible) Y 2 u 2 (X } , X 2 , X 3 ) and Y 3 =^ u 3 (X u X 2 , so 
that y t ^ Vi ^ x 3 ), y 3 = u 3 (x u x 2 ,x 3 ) define a 

one-to-one transformation of onto 獻 This would enable us to find 
the joint p.d.f. of Y u Y 2 , and y 3 from which we would get the marginal 
p-dX of by summing on y 2 and y 3 - 

Example 2. LetX { and X 2 be two independent random variables that have 
Poisson distributions with means juj and p 2 , respectively. The joint p.djf, of 
X x and X 2 is 


x { \ x 2 l 


=0，1，2, 3, • 



A = 0 ， I ， 2, 3, 



and is zero elsewhere. Thus the space is the set of points (x ( , x 2 ) f where 
each of x } and x 2 is a nonnegative integer. We wish to find the p.dX of 
+ X 2 . If we use the change of variable technique, we need to^efine 
a second random variable K 2 - Because Y 2 is of no interest to us s let us 
choose it in such a way that we have a simple one-to-one transformation. 
For example, take Y 2 — X 2 - Then yi — X\ + x 2 and y 2 — x 2 represent a 
one-to-one transformation that maps onto 

劣 = {(^ij : ^2 == 0? 1， " * ， Ji and 乃 = 0,1 ， 2, -, * 

Note that, if {y x , y 2 ) e 0, then 0 < y 2 < The inverse functions are given by 
A — y x — j 2 and x 2 = - Thus the joint p,d-f. of Y } and Y 2 is 

n/i -yifj^2 e -m -fii 

giyuyj) = ^y 2 )iy 2 \ ， (yuyz)^^. 
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and is zero elsewhere. Consequently, the marginal p.d.f. of Y, is given by 


y[ 

8i(yt) = g(yi, y 2 ) 

少 3 = 0 


yi y \ 


b 


少!！ 


j, = 0, 1 7 2 , 


+ X 2 has a Poisson distribution with 

# ■' 


and is zero elsewhere. That is, r ,： 
parameter + ^ 

Remark. It should be noted that Example 2 essentially illustrates the 
distribution function technique too. That is, without defining V 2 ^ X 7 we 
have that the distribution function of r, - JT ( + X 2 is 2 

— Pr (^i + Xi < ji). 

In this discrete case, with y t = 0, 1,2,... t the p.d.f. of K, is equal to 

Si(yi) = — <7, (>；| — I) = Pr (X t + X 2 = y { ). 

That is, 

. ：i 


A + 


x 2 l 


b^Z^n^° n ' S ° Ver 3l! points of ^ such that + ^2 = J, and thus can 

► 


^iOi) 




文 2 = 0 (j^I — X 2 )\ x 2 \ 


which is exactly the summation given in Example 2. 

Exampk 3. In Section 4J, we found that we could simulate a 

L C= tyP h e ran ^ m variabIe X with 此 tribmicm function F{x) through 
X - f ⑺， where Khasa umforrn distribution onO <y<l. In a sense we 
can simulate a discrete-type random variable X in much the same way but 
ye must understand what X ^ F^\Y) means in this case. Here F{x) is astep 
function with the height of the step of x = x 0 equal to Pt(X = x 0 ) For 

lustration, in Example 3 of Section 1.5, Pr (I = 3) = | is the height of the 

step at x = 3 m Figure 13 that depicts the distribution function, if we now 
^ mk 气 ㈣ 咖叩 a random point Y, having the uniform distribution on 

3 3 the Vertlca, axis of F _re i .3, the probability of falling between 

pnd ^ 心 However, if it falls between those two values, the horizontal line 
^rayn from it would “hit” the step at jf = 3. That is, for |<v<5 then 

0fco 瞭 ， ^en 广 _O0 = 2; and if 0 <^<|^ehave 
户 00 = 1, Thus，with this procedure, we can generate the numbers x^=l. 
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x = 2, and x — 3 with respective probabilities ^ and as we desired. 
Clearly, this procedure can be generalized to simulate any random variable 
X of the discrete type, 

4 * « 

EXERCISES 

4,17. LetX have a pA.f,J{x) = x = I, 2, 3, zero elsewhere. Find the p.df* 
of Y=2X+ L 

418. If f(x u x 2 )^ (|f i^ 2 (i) 2 -x, (Xu Xi) = (0? QX (0 ^ 1X (1 ， 0) ， (1 ， 1 )， 

zero elsewhere, is the joint p,d.f, of X x and X 2l find the joint p.dX of 
F ( - JT, - X 2 and Y 2 ^ X, + X 2 . 

4*19. Let X have the p.d*f / {x) = ㊉ 〜 x = I ， 2, 3,…， zero elsewhere* Find 
the p.dX of Y = X 3 , 

4,20* Let and X 2 have the joint p.dXy(x,, jc 2 ) - jc,x 2 /36, x l — 1,2, 3 and 
x 2 = 1, 2 S 3, zero elsewhere. Find first the joint p.d.f. of 匕 = X x X 2 and 
^2 — ^ 2 * and then find the marginal p.d,f of ， 

4.21. Let the independent random variables and X 2 be b(ni , p) and b(n 2 ^ p), 
respectively. Find the joint p*d.f. of Y { = X v + X 2 and Y 2 = X 2 , and then 
find the marginal p*d*f of 
Hint: Use the fact that 



This can be proved by comparing the coefficients of / in each member of 
the identity (I + x) n} (l + xf 2 = (I + x) ni + ^. 

4 , 22 , Let X x and X 2 be independent random variables of the discrete type with 
joint p-dX/i(x r )/i(x 2 ) ， (x u x 2 ) e j/. Let 乃 =^(xj) and j 2 = w 2 (jc 2 ) denote 
a one-to-one transformation that maps si onto 潘 . Show that F, = 
and Y % — u 2 (X 2 ) are independent, 

■f* - 秦 

423 _ Consider the random variable X with p.d.f. fix) — xj\ 5, x — 1, 2, 3, 4, 
5, and zero elsewhere. , 

(a) Graph the distribution function F{x) of X. 

(b) Using a computer or a table of random numbers, determine 30 values 
of K, which has the (approximate) uniform distribution on 0 < y < 1_ 

(c) From these 30 values of F, find the corresponding 30 values of X and 
determine the relative frequencies of ^: = 1, x — 2, x = 3, x = 4, and 
-x = 5. How do these compare to the respective probabilities of 六，各 

3 4 5 9 

B ， II， B + 

♦ • ■ * m 1#曹* 

4 * 24 . Using the technique given in Example 3 and Exercise 4.23, generate 50 
values having a Poisson distribution with /i — I. 

Hint: Use Table I in Appendix B. 
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4,3 Transformations of Variables of the Continuous Type 

In the preceding section we introduced the notion of a one-to-one 
transformation and the mapping of a set si onto a set under that 
transformation. Those ideas were sufficient to enable us to find the 
distribution of a function of several random variables of the discrete 
type. In this section we examine the same problem when the random 
variables are of the continuous type* It is again helpful to begin with 
a special problem. 


Example 1. Let X be a random variable of the continuous type, having 
p.dX . 


A x ) = 0 < x < I, 

= 0 elsewhere. 

Here ssf is the space {x:0<x< 1} 3 where /(x) > 0. Define the random 
variable T by Y = SX 3 and consider the transformation y — 8X 3 . Under 
the transformation y = Sx 3 , the set s/ is mapped onto the set 满 = 
{ 少 ：G < y < 8} ， and ， moreover, the transformation is one-to-one. For every 
0 < a <b< 8, the event a< Y <b will occur when, and only when, the 
event \^/acX < occurs because there is a one-to-one correspondence 
between the points of ^ and 風 Thus 

1 Pr(a< 


‘二 f 狗 2 

= 2x dx, 

^/2 

Let us rewrite this integral by changing the variable of integration from x to 
y by writing y = Sx 3 or x — Now 


dx _ 1 

dy 6^ 3 


and, accordingly, we have 


Pr (a<Y<b) 




p 齡 ) 办 


r^b 




6, 


dy, 


Since this is true for every 0 < a < A < 8, the p*dX g(y) of Y is the integrand: 
that is, 


0< 少 <8, 

= 0 elsewhere. 
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It is worth noting that we found the p,d,f, of the random variable 
Y = 8A" 3 by using a theorem on the change of variable in a definite 
integral However, to obtain g(y) we actually need only two things: 
(1) the set 激 of points y where g(y) > 0 and (2) the integrand of the 
integral on y to which Pr (a < Y < ti) is equal. These can be found 
by two simple rules: 


1- Verify that the transformation y = Sjc 3 maps ^ {x :0 < x < 1} 

onto ^ {y:0 < y <S] and that the transformation is one- 

to-one. 

2. Determine g(y) on this set 激 by substituting \^/y for 

and then multiplying this result by the derivative of That 

is ， 



0 < j < 8, 


= 0 elsewhere. 

■ 

We shall accept a theorem in analysis on the change of variable in 
a definite integral to enable us to state a more general result. Let X be 
a random variable of the continuous type having p.d‘f. J\x), Let ^ 
be the one-dimensional space where J{x) > 0. Consider the random 
variable Y = wherey = u(x) defines a one-to-one transformation 
that maps the set onto the set 潘 .Let the inverse of y = u(x) 
be denoted by x = w(yX and let the derivative dxjdy = w'(y) be 
continuous and not equal zero for all points y in 3S. Then the p.d.f. 
of the random variable Y — u(X) is given by 

g(y) = R^(y)\W(y)l y e 激， 

= 0 elsewhere ， 

where [^(^)! represents the absolute value of w'{y). This is precisely 
what we did in Example 1 of this section，except there we deliberately 
chose y = Sx 3 to be an increasing function so that 

0=0^ 0<>；<8， 

is positive, and hence 

I 1 I 1 

一 | = 0 < y<^ 
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Henceforth, we shall refer to dxjdy - w\y) as the Jacobian (denoted 
by/) of the transformation. In most mathematical areas, J ^ w f (y) is 
referred to as the Jacobian of the inverse transformation x = w(y), but 
in this book it will be called the Jacobian of the transformation, simply 
for convenience. 


Example 2. Let X have the p.d,f. 

— i, o < x < I, 

1 ► * 

= 0 elsewhere. 

We are to show that the random variable -2lnX has a chi- 
square distribution with 2 degrees of freedom. Here the transformation 
is y - u(x} = —2lux, so that jt w(y) - The space W is i = 
{x:0 < x < I}, which the one-to-one transformation ^ = ~2ln xmaps onto 
^ = {y < y < oo}. The Jacobian of the transformation is 


瞥 

Accordingly, the p.dX g(y) ofY^ -2lnXis 

g(y) ^f(e~ yjl )\J\ = ' 

= 0 elsewhere, 


e 


—yil 


0 < ^ < 00 , 


a p.d.f that is chi-square with 2 degrees of freedom. Note that this problem 
was first proposed in Exercise 346, 


This method of finding the p.d.f. of a function of one random 
variable of the continuous type will now be extended to functions of 
two random variables of this type. Again, only functions that define 
a one-to-one transformation will be considered at this time. Let 


~ 叫 (A ， A) and y 2 — td 2 (x u x 2 ) define a one-to-one transformation 
that maps a (two-dimensional) set W in the plane onto a 
(two-dimensional) set M in the plane. If we express each of x x and 
A in terms of y } and y 2f we can write x r — w l (y u y 2 )^ x 2 = ⑷ 2 (U 2 ), 
The determinant of order 2, ' 


dxi dx } 
^y\ dy 2 

dx 2 dx 2 
dy 2 


is called the Jacobian of the transformation and will be denoted 
by the symbol / ft will be assumed that these first-order partial 
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derivatives are continuous and that the Jacobian J is not ident- 
ically equal to zero in 0S. An illustrative example may be 
desirable before we proceed with the extension of the change of 
variable technique .to two random variables of the continuous 

type- , 

mi' 

Exmnple 3. Let j/ be the set — {(x u x 2 ) : 0 < 〜< 1， 0 < x 2 < 1 } 
depicted in Figure 4 丄 We wish to determine the set M in the 7 ,^ 2 -plane that 
is the mapping of under the one-to-one transformation 

m 辱 

Jh = Wi(^,X 2 ) = X| + x 2 , 
y 2 ^ u 2 (x u X 2 ) - X, -X 2 , 

and we wish to compute the Jacobian of the transformation. Now 

•太 ■ = With , 乃+乃)， 

x 2 = w 2 (y u y 2 ) = Kv ， 一 y 2 y 

To determine the set 0S in the 7 ^ 2 -pIane onto which si is mapped under the 
transformation，note that the boundaries of are transformed as follows into 
the boundaries of 激； 


Q 

into 

0 

= 5 CKi + yi\ 

A = 1 

into 

1 

= {(yi+ yi). 

x 2 — 0 

into 

0 

= iiy\- yiX 

义2 = 1 

into 

1 

=1 Oi — B)_ 





气； o 




FIGURE 4.1 
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y 


FIGURE 4.2 


Accordingly, m is shown in Figure 4.2. Finally, 


dx t dx t 

\ 

1 1 

2 2 

^2 Sx 2 

1 dy 2 


1 1 

2 ~2 


2 


ariefn? a c ? t， ^ Ithou £ h ^ ^ Example 3, we suggest transforming the bound- 
anes ot others might want to use the inequalities 

0 < < I and 0 < jr 2 < 1 

directly. These four inequalities become 

0 <办 +h)< 1 and 0<l(y { -y 2 )< h 
It is easy to sec that these are equivalent to 

1 < 少 2 ，乃 <2— 乃， y 2 <y u Ji - 2 < y 2 ; 

l heyset 潘 , In this example, these methods were rather simple 

Zn^Z f Y f me ； ° ther ex 細咖咖 Id present more complicated 

and only experience can help one decide which is the best 
method m each case. 

■b 

n ? w P r ⑽ ed with the problem of finding the joint p.dX of 
f W0 tunctlons of two continuous-type random variables. Let X y and X, 
= ran ^°? 1 vanab !es of the continuous type, having joint p,d,f. 
nix u x 2 ). Let ^ be the two-dimensional set in the x,jc r plane where 
x 2 ) > 0. Let Y t = u t (X { , Jf 2 ) be a random variable whose p,d.f- 
is to be found. If 乃 = u^x^xj) and y 2 = u 2 (x' ， x 2 ) define a one-to- 
one tran sforniation of si onto a set m in the y,j 2 -plane (with 
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nonidentically zero Jacobian), we can find，by use of a theorem in 
analysis, the joint p,d.f. of Y x — u { (X u X 2 ) and Y 2 = u 1 {X u X 1 ). Let 
A be a subset of and let B denote the mapping of A under the 
one-to-one transformation (see Figure 43), The events (X u X 2 } e A 
and d Y 2 ) e B are equivalent* Hence 

Fr[(Y u Y 2 )€B} = Pr[(X u X 2 )eA] 






h(x t ， x 2 ) dx x dx 2 、 


We wish now to change variables of integration by writing y x = 
x 2 ),y 2 - u 2 {x v , x 2 \ or^ = w t (y u y 2 ) 9 x 2 = w 2 {y u 少 2 ). It has been 
proved in analysis that this change of variables requires 




h{x u x 2 ) dx x dx 2 




K^\(y},y2% ^2{y\,yi)Mdy x dy 2 . 


B 


Thus, for every set B in 0, 


Pr[(r, ， r 2 ) e 用 


h[w l (y l ,y 2 ), w 2 (y u y 2 )]\J\dy l dy 2 , 


B 


which implies that the joint p.dX g(y\,y 2 ) of Y { and Y 2 is 

giyuyi) ^ h[w } {y l9 y t ), w 2 {y u y 2 )]\A (U2) e 潘， 

— 0 elsewhere* 


Accordingly, the marginal pA.t gi(y { ) of Y x can be obtained from the 
joint p.dX giyx yyi) in the usual manner by integrating on y 2 . Several 
examples of this result will be given. 

Example 4. Let the random variable X have the p*dX 

J{x) = l s 0 < x < I, 


= 0 elsewhere ， 



FIGURE 4.3 
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a 二 d let X U X 2 denote a random sample from this distribution. The joint p,d f 

of X l and X 2 is then r 

* ■ • 

4 

啦 1 ，又 2 ) =7(^1 M^ 2 ) =1 ， 0 < X l < I , 0 < x 2 < I, 

= 0 elsewhere. * 

Consider the two random variables = JT, + JT 2 and X 2 . We wish 

to nd the joint p.d.f of Y t and F 2 . Here the two-dimensional space in the 
jr^rplane is that of Example 3 of this section. The one-to-one transform 

= atlon 少 1 + 工 2 ，h = 4 — maps # onto the spaced of that example. 

oreover，the Jacobian of that transformation has been shown to be J =—- 
Thus 2 • 

§ * 彎 ♦ ■ 譬 

g(y\^ yi) ^ h[{( y] + y 2 X{( yi - y 2 ypi 

=f[j(yi + yi)\f[\ (y x - y 2 )]\J\ ^ i, {y u y 2 )e^, 

= 0 elsewhere, 

产 B 

Because ^ is not a product space, the random variables Y x and Y 2 are 
dependent. The marginal p.d*f. of Y i is given by 

St(yi) ^ g(yi,y 2 )dy 2 ： 

If we refer to Figure 4.2, it is seen that 

吵 I 

^ i d yi ^yu o< yt < i, 

J -y\ , 

蟑 

广 2 — 少 I 

= ! 办 2 = 2-7 _， ! < < 2 S 

^yt-2 , 

= 0 elsewhere* 

In a similar manner, the marginal p.d,f. g 2 (y 2 ) is given by 

^(yi) = {dy x = 少 2 + 1 ， —1 < jr 2 < 0, 

= ! 办 i = l -少 2， 0 〈少 2<i, 

— 0 elsewhere. 

聲 

^ xam P le S* Let X u X 2 be a random sample of size n = 2 from a stan- 
ft nor ™f distribution. Say that we are interested in the distribution 
0 」 =Often in selecting tKe second random variable, we use 
tje denominator of the ratio or a function of that denominator. So let 
2 = AV With the set {(x u x 2 ) : — oo < <^oo, — oo < jc 2 < oo}, we note 


% 
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that the ratio is not defined at x 2 = 0* However, Pr (X 2 - 0) = 0; so we take 
the pAS. of X 2 to be zero at x 2 = 0. This results in the set 

si = {(JCh x 2 ) : —od < x { < oo f — oo < x 2 < 0 or 0 < x 2 < oo}* 

With y l - x { /x l7 y 2 = x 2 or, equivalently, x { ^ x 2 - y 2 , ^ maps onto 

^ = {(y\^yi) : —co<y { <oo 5 - <x> <y 2 <0 or 0 <y 2 < oo}. 

Also, 


y% y\ 
0 1 


h #0, 


Since 


h{x u x 2 )^^xp 


— 2(4 + 4) 


0c_ ， jc 2 ) 


we have that the joint p.d.f. of Y x and Y 2 is 


giyi,yi)^2n txp 


—2 + 力 


I-FzIj Ch ， 乃） e 说 


Thus 


gi(yi) 




^1^2) ^2 


发(少1，少2)办 2_ 


^0 


Since g(y x , y 2 ) is an even function of y 2i we can write 


00 


g\M = 2 


2n 


exp 




jAO +/t) 


(yi) dy 2 


丄 

n 


exp[—!M(l +J^)] 
1 + J? 


a 


E(1 +〆)’ 


00 < < 00 


This marginal pAS. of Yi — XJX 2 is that of a Cauchy distribution. Although 
the Cauchy p*dX is symmetric about y x = 0, the mean does not exist because 
the integral 

biteiCFi) 办 ■ 


oo 


does not exist. The median and the mode ， however, are both equal to zero. 

Example 6, Let =\(Xi — X 2 X where Jf, and X 2 are UA. random 
variables, each being jf 2 (2) - The joint p.df. of 不 and X 2 is 


I 


私 4 饮 p 


Xj + x 2 


• i . 

0 <x { < 00 , Q <x 2 < 00 , 


0 elsewhere. 
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Let r 2 - X 2 so that - x 2 ), y 2 = x 2 or x, = 2 y] + y 2 , x 2 = define 

I transformation from ^ ^ {( Xu Xl) ：0 < X] < oo,0 < x 2 < 00 } 

ont<^^{0^ 2 ) : —2 乃 <y 2 andO<j 2 , -Qo < 00 }. The Jacobian of 
the transformation is 


J 二 2 
0 

hence the joint p.dX of 6 and Y 2 is 

/ 、 PI 

-?(少】，少 2) = 了厂乃 一' 


= 2 ; 


(少】 ， 少 2 ) e 風 


= 0 elsewhere. 

Thus the p,dX of Y l is given by 

，oo 

S\(yi) — 2^ ^yi = \ 7 ― oo <^j < o ? 

^-2y[ 

f*QO 

^ { 2 €-y^y^dy 2 ^\e-y\ 0 <^ < 00 , 


or 

: ， Si(y\) = \ . — 00 < j；! < 00 , 

This pAS. is now frequently called the double exponential pAf, 

Exa ^ 7* In this example a rather important result is established. Let 

f? dent rand0m variables of the continuous type with joint 
心(二 JM，）that is positive on the two-dimensional space Let 

vir U{iXl K a Unction of ^ aione 9 and Y 2 = Ul (X 2 \ a function of JST 2 alone, 

汁 : ^ ° r I e P re ^ent that y } = y 2 = u 2 (x 2 ) define a one-to-one 

ransformation from onto a two-dimensional set 涿 in the 乃斤 plane. 
Solving forx, andx 2 m terms o[ yi and/ 2> wehave^ = ^(^)andx 2 = w 2 (y 2 \ 


j _ w\(y { ) 0 

i 0 咖=咖 • 

Hence the joint p.dX of K! and V 2 is 

办，少 2 ) ^fd^i (yi )]f2Wi(y2)W\ (yi M(y 2 )\ f 0 w 2 )€ 潘， 

= 0 elsewhere. 

However from the procedure for changing variables in the case of 
one random variable，we see that the marginal probability density 
functions of V t and Y 2 are, respectively, 客心 ^ 伽也伽 ;⑻ | an d 








Sec* 4.3J Transformations of Variables of the Continuous Type 


177 


Siiyi) — f^ 2 {y 2 )]\ w 2 {yi)\ for y t and y 2 in some appropriate sets* Con 
sequently, 


g(y},y2)^gi{yi)g2(y 2 y 

Thus ， summarizing^ we note that if X\ and X 2 are independent random 
variables, then Y l = u } (X t ) and Y 2 — u 2 (X 2 ) are also independent random 
variables. It has been seen that the result holds if JJT, and X 2 are of the discrete 
type; see Exercise 4.22, 

M 

k 

In the simulation of random variables using uniform random 
variables, it is frequently difficult to solve y ^ F(x) for x. Thus other 
methods are necessary. For instance, consider the important normal 
case in which we desire to determine X so that it is ^(0, 1). Of course, 
once X is determined, other normal variables can then be obtained 
through X by the transformation Z — uX + ^ 

To simulate normal variables，Box and Muller suggested the 
following procedure. Let y l? Y 2 be a random sample from the uniform 
distribution over 0 <y< L Define JT, and X 2 by 

21 n Y^ 2 cos(2kY 2 1 

K21n Y { r 2 dn(2nY 2 ). 


The corresponding transformation is one-to-one and maps 
{( 少 | ，乃 ）:0 <乃 <1 ， 0<y 2 < 1} onto {(x l ,x 2 ):-oo <x, < oo, 
-co <x 2 < co] except for sets involving = 0 and x 2 = 0, which 
have probability zero. The inverse transformation is given by 

(x] + x\ 
y\ = exp -— = — 



This has the Jacobian 


yi 


2n 


arctan 


A 




(— 々 ）exp 


X 2 } + 


— JC 2 /x 言 

(2tt)(1 + x\lx]) 


(-咖 P (-爭) 

(2 宂 )(1 + x\jx]) 


(1 + x\jx 2 x ) exp 


~ Y~ 


(2ti)(1 + x\lx]) 
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exp 


x] + x\ 


2n 


00 < X) < 00 




oo < x 2 < oo 


.■- 

That is, X { and X 2 are independent standard normal random variables 
e close this section by observing a way of finding the p.d.f 
? f sum of two independent random variables. Let X x and X 
be m^p’dent with respective probability density functions f x (x ； 
and / 2 (^ 2 ). Let )W_+X 2 and V 2 = Thus we have the 
one-to-one transformation = 凡 — 少 2 an d x 2 = y 2 with Jacobiati 
/ 二 L Here we say that 叫 ( 々， 灿 -^oo < ^ < oo, ^oo < < oo} 

二 aps onto ^ = {(^i, ^ 2 )* — oo <y } < oo^ — 00 < < 00}，but we 

recognize that m a particular problem the joint p.d.f. might equal zero 
on some part of these sets. Thus the joint pAJ. of Y } and V 2 is 

’ 少 2) - 少 2 )/ 办 2 )， (ji, y 2 ) e 

and the marginal p.d.f. of r, = ^, + X 2 is given by 




My i - 少2)/2鈔2)办 2 , 


which is the well-known convolution formula. 


EXERCISES 

p ■ 丄 = ?/9, 0 < x < 3, zero elsewhere. Find the 

p.dX of F = X 3 . 

r v = 2 文 e’w，0 < x < 00 , zero elsewhere, determine 

the p.d.r of X 2 . 

4.27. Let X have the logistic M//(x) = e ^ x I(\ + e~^ 2 一 00 < jc < 00 
⑻ Show that the graph of fix) is symmetric about the vertical axis through 

^ = Up 

(b) Find the distribution function of X 

(c) Find the p.dX of K = f' 

(d) Show that the m.g.f. M(t) ofXisF(l - r)r(l + ,) ， 一 1 < , < L 
Mint: In the integral representing M(/), let ^ = (1 + e^ x y l . 
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4*28. Let X have the uniform distribution over the interval (― 兀 /2, ji/ 2). 
Show that Y — tan X has a Cauchy 4istribution_ . ^ 

9 * - r * 

4*29* Let X, and X 2 be two independent normal random variables, each 
with mean zero and variance one (possibly resulting from a Box-Muller 
transformation). Show that 

t ■ ^ 

Zi = 从十 a x X u 

■ 蜃 _ 

■ ■ ■* .■ ■ ■ 

-^2 ^ /^2 + P u 2^\ + ， P 2 X 2 , K 

where 0 < a h 0 < a 2 , and0 < p < 1， have a bivariate normal distribution 
with respective parameters fi { , fi 2 , a 2 u a\ r and p. 

嚀 h 甲琴 • ！■ « 

4.30* Let A", andX 2 denote a random sample of size 2 from a distribution that 
is N(fi, a 2 ). Let Y t = X t + X t and Y 2 — "- X 2 . Find the joint p,d.f. of Y { 

and Y 2 and show that these random variables are independent. 

« ^ j’ 

431, Let JSfj and A" 2 denote a random sample of size 2 from a distribution that 
is N(ji^ a 2 ). Let Y t = X } + X 2 and Y t = X x + 2X 2 . Show that the joint p,d.f. 
of and Y 2 is bivariate normal with correlation coefficient 3/^/l0. 

4-32, Use the convolution formula to determine the p.dX of y 】 — X } + X 2i 
where X\ and X 2 are i,Ld. random variables, each with p.d,f 
0 < Jt < oo, zero elsewhere. 

flint: Note that the integral on y 2 has limits of 0 and y lr where 
0 < y { < oo. Why? 

4.33* Let X { and X 2 have the joint p.df. x 2 ) — 

0 < 七 < x 2 < 00 , zero elsewhere. Find the joint pAI. of Y t — 2X x and 
Y 2 — X 2 — X ] and argue that Y x and Y 2 are independent. 

4.34. Let X { and X 2 have the joint p.d,f. h(x v , x 2 ) — Sx { x 2 , 0 < x { < x 2 < 1, 
zero elsewhere. Find the joint p.d.f of Y { = XJX 2 and Y 2 - X 2 and argue 
that F! and V 2 are independent. 

Hint: Use the inequalities 0 < y { y 2 < y 2 < I in considering the mapping 
from onto 


4,4 The Beta ， ，， and F Distributions 

* . 

It is the purpose of this section to define three additional 

i* - i 

distributions quite useful in certain problems of statistical inference- 
These are called, respectively, the beta distribution, the (Student’s) 
/-distribution, and the F-distribution. - 
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The beta distribution. Let X l 汪 nd be two independent random 

variables that have gamma distributions and joint p.d.f. 


h(x y , x 2 )- 


r(a)r(fi) x ^ ]x ^ 


X 卜 x 2 . 


0<X|<oo, 0<x 2 <co r 


zero elsewhere, where a > 0 ， 於 > 0 ， Let K, = ^ + X 2 and Y 2 ^ 
XJ(X ] + X 2 ). We shall show that Y x and Y 2 are independent. 

The space sd is, exclusive of the points on the coordinate axes，the 
first quadrant of the jc 2 -plane. Now 

y\ ^ W|(Xb x 2 > = 々 + 
yi = u 1 (x u x 2 ) = —^- 

Xj ^1- A2 

may be written % 迄 y { y 2> x 2 ^y ] (l - y 2 ) y so 



~y\ ^ o* 


The transformation is one-to-one, and it maps onto ^ — 
{Ch ， h) ： 0 < j| < oo, 0 < j； 2 < 1} in the j^ 2 -plane. The joint p ， d ， f. 
of F| and Y 2 is then 


办■，少 2) = 广 1 [Ml -y2))^ l e- yi 

yT\^-yiY~ x 

= e ~ yt, O< h <00 ， 0<y2<u 

= 0 elsewhere. 


In accordance with Theorem 1, Section 24, the random variables are 
independent. The marginal p,d,f. of Y 2 is 


giiyi) 


f2- ] (l-y 2 ) p 

—r(a)ro?) 


f*o0 


yf 


+ p — ^_yi 


办 I 


4) 


+ fi) 

mm 


少 r_(! - 乃 ) 卜_， 0< 少 2 < 1 ， 


= 0 elsewhere. 

This p.d 丄 is that of the beta distribution with parameters ol and j?. Since 
giy^yz) = g\(y\)g 2 (y 2 % it must be that the pAS. of F, is 

1 - 

= o< < oo, 

— 0 elsewhere. 
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which is that of a gamma distribution with parameter values of 
a + p and L 

It is an easy exercise to show that the mean and the variance of 
F 2 ，which has a beta distribution with parameters a and 办， are, 
respectively, 

— a _2 ■ ap 

a + 办 (oc + + l)(oc + 泛 )2 

* 4 

The /-distribution. Let W denote a random variable that is ^(0, 1); 
let K denote a random variable that is x 2 ( r )l and Jet W and V be 
independent. Then the joint p.dX of W and V, say h{w,v\ is the 
product of the pAS, of W and that of V or 

~ oo < w < oo^ 0 < d < oo, 

= 0 elsewhere. 


Define a new random variable Tby writing 



The change-of-variable technique will be used to obtain the p.d_f. ^ (/) 
of T. The equations 


w 


and 



define a one-to-one transformation that maps si ― {(w, v) : — oo < 
w < co, 0 <v < oo} onto M ^ {(/ ， u) : — oo < f < oo, 0 < u < oo}. 
Since w — ty/uj^/r, v - ^ the absolute value of the Jacobian of the 
transformation is \J] ^ y/uj^/r. Accordingly, the joint p,d.f. of T 
and t/ = Kis given by 

g(t, u) = h 



r(r/2)2" 2 


i/ /2 - * exp 





— oo < / < oo, 0 < w < oo. 


0 elsewhere 
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The marginal p.d,f ； of T is then 


g\(0 


g(U u) du 


o 


r(r/2)T 12 


u {r + l)/2 _ 1 exp 


u 

2 




du. 


In this integral let z ^ u[l + (f/r) 】 /2, and it is seen that 


/*00 


g\ (0 = 


I 


2z 


<^+ m— 


J 0 r(r/2)2 r/2 V + 


e 


2 


+ t 2 /i 


dz 


r[(r+l)/2] 1 

v^r(r/2) (i + / 2 /0^ I)/2 


— oo < r < oo 


Thus, if W is N(Q ， 1), if V is ^ 2 (r), and if W and V are independent, 
then 4 


T 


W 




has the immediately preceding p.d.fl 幻 ⑺. The distribution of the 
random variable T is usually called a t-distribution. It should 
be observed that a /-distribution is completely determined by the 
parameter r, the number of degrees of freedom of the random variable 
that has the chi-square distribution. Some approximate values of 


Pr (T < t) 




g { (w) dw 


00 


for selected values of r and t can be found in Table I V in Appendix B, 

Remark. This distribution was first discovered by W. S. Cosset when he 
was working for an Irish brewery* Because that brewery did not want other 
breweries to know that statistical methods were being used，Gosset published 
under the pseudonym Student. Thus this distribution is often known as 
Student's /-distribution- 

The ^distribution. Next consider two independent chi-square 
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random variables U and V having r x and r 2 degrees of freedom, 
respectively. The joint pAI. h{u, v) of U and Fis then 


h(u ， v) 






u n/2 - \ v r 2 /2 - l e -(w + v)l2, 

0 < w < co, 0 <v < oo^ 


— 0 elsewhere. 

We define the new random variable 


W 


u/n 

Vir 2 


and we propose finding the p*d.f. g x (w) of W. The equations 


w 


Wi 


2 




define a one-to-one transformation that maps the set = 
{(w, p) : 0 < w < oo, 0 < i; < oo} onto the set 激 = {(w, z):0 <w < oo, 
0 < z < oo}，Since u — (r"r 2 ) 2 w，v = z y the absolute value of the 
Jacobian of the transformation is \J\ = (r\/r 2 )z. The joint p.dl, g(w, z) 
of the random variables W and Z — K is then 




r x zw 


nf- 


r(r ( /2)r(r 2 /2)2^» + r 2>/ 2 \ r 2 


z 城 


x exp 


2 (r x w 

2\TT 




provided that (w ， z) e 激 ， and zero elsewhere. The marginal p,d ， f, g\(w) 
of W is then 


产 00 


幻 ㈣ 


g(w, 2) dz 


r (ri/r 2 Y l/2 (w) r ^ ] 




r(r,/2)r( ， 2 /2)2< r _ +r2 >, 2 


z (fi+ n)j2 — 


x ex p 


z fr x w 

2 l^T 


dz. 


If we change the variable of integration by writing 




y 


2 lT\W 

Hi 


i 
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it can be seen that 


/•CO 


^i(^) 




(rjr 2 y [/2 (^) n/2 ^ 1 

f(r { /2)r(r 2 /2)2^ + r ^ 2 {r, w/r 2 + I 


2y 


(r 卜 r 2 )/2 - 


e 


-y 


x 


r y wjr t + 


dy 


r[(r, + r 2 m(rdr 2 )^ 2 ⑽参 1 


Hr 】 /2)r(r 2 /2) 

= 0 elsewhere. 


(1 +^w^ 2 )( r _ +r2 ) /2 ’ 


0 < w < oo. 


Accordingly, if U and V are independent chi-square variables with 
r \ and r 2 degrees of freedom ， respectively，then 

uin 


w 


Vir 2 


has the immediately preceding p,dX g\(w\ The distribution of this 
random variable is usually called an F~distribution; and we often call 
the ratio, which we have denoted by W, F, That is, 

o/n 


F 


y/r 2 


It should be observed that an /^distribution is completely determined 
by the two parameters and r 2 * Table V in Appendix B gives some 
approximate values of 


户 * 


Pr (F < b )= 
for selected values of r, ? r 2 , and b. 


gi(w) dw 


o 


EXERCISES 


4,35_ Find the mean and variance of the beta distribution. 
Hint: From that p,d,f” we know that 


ri 

\ 广*(1 一少广 1 办 = 


r ⑷ r (灼 
r(a + 灼 


for all a > 0 ,办 > 0, 

436. Determine the constant c in each of the following so that each f(x) is 

么 beta p.d.j\ 

(a) f{x) = cx(l — xf, 0 < x < 1, zero elsewhere. 
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(b) f{x) — ^(1 — jc) 5 , 0 < x < I 5 zero elsewhere, 

(c) f{x) = cx 2 ^ 0 < x < }, zero elsewhere. 

4.37* Determine the constant c so that f(x) = cx(3 — x) 4 , 0 < x < 3, zero 
elsewhere, is a p.d_f. 

4.38, Show that the graph of the beta p*d,f. is symmetric about the vertical 
line through x — 5 if a = 

439. Show, for 灸 = 1 ， 2 , .，• ，％ that 


nl 


p 


(k- I )f (n - Jfc)! 


2 k ~ x (\ — zf~ k dz 



，(1 —於 


This demonstrates the relationship between the distribution functions of the 
beta and binomial distributions* 


4,40, Let T have a /-distribution with 10 degrees of freedom. Find Pr (j T\ > 
2,228) from Table IV* 


4.41* Let T have a 卜 distribution with 14 degrees of freedom. Determine b 
so that Pr(^b<T<b) = 0.90- 

_ , m 

4.42, Let F have an ^-distribution with parameters r } and r 2 . Prove that l/F 
has an jF-distribution with parameters r 2 and r!. 

4,43* If F has an F-distribution with parameters r, = 5 and r 2 = 10, find a 
and b so that Pr (F < a) — 0.05 and Pr (F < b) 0.95, and ， accordingly, 
Pr (a < F< b) ^ 0-90, 

Hint: Write Vt(F< a) = Pr(I/F> 1/a) = I — Pr(l/F^ 1 /a), and use 
the result of Exercise 4.42 and Table V. 

in f r ■ 1 

4.44* Let T = Wjyjvjr, where the independent variables W and V are ， 
respectively, normal with mean zero and variance 1 and chi-square with r 
degrees of freedom. Show that T 2 has an /^distribution with parameters 
j = 1 and r: 

Hint: What is the distribution of the numerator of 7 ^ ? 

445* Show that the /-distribution with r — I degree of freedom and the 
Cauchy distribution are the same- 

4,46, Show that 

y = j 

1 + (r,/r 2 )fT 

where W has an /^distribution with parameters and r 2l has a beta 
distribution. 


4,47. Let X u X 2 bc a random sample from a distribution having the pH 
f[x) — e~ x 9 0 < x < 00, zero elsewhere. Show that Z — X x jX 2 has an 
i^distribution. 
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4,5 Ext 咖 ions of the Change-of-VariabJe Technique 

# ■ 

s ^tion 4.3 it was seen that the determination of the joint 
e S 二 H S n tW ° rand0m tables of the continuous type was 

t S m 3 t ； T 0f0ld inte ^^ This theorem has a natural 
integral of the" fon^^ ^ extension is as follows. Consider an 






K x u x 2 , … ， x n ) dx ， dx 2 — - dx n 


taken over a subset A of an ^dimensional space j/. Let 


A 




少 2 = W 2 ( X , ，々，■,•，〜)， 


4 )， 




together with the inverse functions 

— W l(yif J 2 s - ^ * ，少 ff )， X2 ― 沙 2(乃，少 2, 

Xn = ., y n ) 

f fi " e a 0ne ' to ' one / transformation that maps ^ onto m in the 

B of Le ； th a fi ( t ^ hCn , Ce mapS the subset d oW onto a subset 

P T iaI derivati ves of the inverse function 
d let the nby n determinant (called the Jacobian) 

I 包 dxj dx t 

2 dy n 

^ x 2 dx 2 dx 2 

兩 W2 …瓦 


dx t1 

not be identically zero in 嚴 Then 


* 

_ 


#_ 

V 


» 

噚 


Sx n 

Wn 


A 


J 


*/ 




… ， x n ) dx' dx 2 


dx tl 


B 


a/ 




h[w x (y { 


A )，. 凡)， .，^,(^1 


yn)] 


x j/l dy x dy 2 * * 9 dy n 
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Whenever the conditions of this theorem are satisfied, we can deter¬ 
mine the joint p,d.f* of n functions of n random variables. Approp¬ 
riate changes of notation in Section 4, 3 (to indicate «-space as opposed 
to 2-space) are all that is needed to show that the joint p.d_t of the 
random variables Y, = Wl (H … K 2 - u 2 {X„X 2 ^^ , X n )， 
• • • ， K = u n {X u X 2j •…， X) — where the joint p.dX ofX u X 2 ^ ^ f X n 
is h{x u •■” ~is given by 

客 Obh ， …， y n ) = ，.•，％)，._.， w n ( y u ., .，凡 )]， 

when ，乃， y n ) e and is zero elsewhere* 

Example L Let X u X 2f ^ X k+l hc independent random variables, each 
having a gamma distribution with ^ = l. The joint p,d.f_ of these variables 
may be written as 


k + 


Xj, * *. ， A + 1 ) = ] 1 p 1 ( 汉 ） ^ 0 ^ oo ? 


0 elsewhere. 


Let 






Xy + X 2 +^^+ X, 


i — 1 ， 2, •" ，众， 


k + ! 


and Y k + l — X x + X 2 + — - + X k ^ } denote k + I new random variables，The 
associated transformation maps si ^ {(jc m ••” x* + 1 ): 0 < $ < oo, / = 1 ， 
+ 1 } onto the spacer : ― ^ 一 

廣 = {(_Ki ， * - ■» yk j yk +1): o < / = 1 ， * ‘ ” A ， 


乃 + • • ‘ + 乃 < 1 , 0 < 凡 +| < oo}. 

The single-valued inverse functions are x l . . . x k — y k y k ^\, 

x k + l = y k+ i(I — 一 …一 y k \ so that the Jacobian is 


乃十 I 

0 


0 


0 y x 

0 y 2 


: : : : =jJ + i 

0 0 … 凡 +| y k 

— jjt + i ^ yk+i ( i — 乃一* — 

Hence the joint p,dX of ^ Y kJ Y k j tl is given by 

WVi …+叫+ 广 1 片卜 1 •• • 少 y W — JV—- … 一九 ) 私 + 广 1 
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provided that ( 夕 i ， ■ ■, ， 外， y ki . ( ) and is equal to zero elsewhere. The joint 
P.dX of y h , 191 Y k is seen by inspection to be given by 





^ - 4 + 1 ) ^ 

… u ^ 


w 卜 _( 卜乃 


ykf k 


\ ^ 


when 0 < 乃 ， f = l” " ，女，乃 + … + 乃 < 1, while the function g is equal to 
zero elsewhere* Random variables Y [y ., Y k that have a joint p,d,f. of this 
form are said to have a Dirichlet distribution with parameters a】 ， .■ ” a A , % + ■, 
and any such g(y lf ■ ■ _ ， 八 ） is called a Dirichlet pAJ, It is seen, in the special 
case of A: = 1, that the Dirichlet p,d_f, becomes a beta p_d.f. Moreover, it is 
a!so clear from the joint p.d.f, of Y u , Y k , Y k+l that Y k + i has a gamma 
distribution with parameters a_ + " * + % + oc“ 【 and = 1 and that ^ + , 
is independent of Y u Y lt … ， Y k . 


We now consider some other problems that are encountered when 
transforming variables- Let X have the Cauchy p.d.f. 


RX) = n{\ I x 2 ) ^ 


— 00 < X < 00, 


and let Y = We seek the p.d.f. g(y) of Y. Consider the 
transformation y = x 2 . This transformation maps the space of 
X ， = oo<x< oo}，onto ^ = {y:0 <y < oo}. However ， 
the transformation is not one-toone. To each yeh, with the 
exception of y = 0 9 there correspond two points For example, 

if 少 = 4, we may have either jc = 2 or x = —2, In such an instance ， 
we represent W as the union of two disjoint sets A t and A 2 such that 
y — ^ defines a one-to-one transformation that maps each of A x 
and 4 onto 獻 If we take A x to be {jc : — oo < x < 0} and A 2 to be 
{^ ： 0 ^ x < oo }，we see tliat A x is mapped onto {y :0 < y < oo}, 
whereas A 2 is mapped onto {y :Q <y < oo}, and these sets are not the 
same ‘ Our difficulty is caused by the fact that x — 0 is an element 
of Why ， then, do we not return to the Cauchy p.df. and take 
/(0) = 0? Then our new is = { —oo<jc<oo but x # 0}, We 
then take A { — {x: ~oo < x <0} and 4 2 = {x:0 < x < oo}. Thus 
少 =x 2 , with the inverse x = — y/y, maps A , onto 0 {y:0 < y < co} 

and the transformation is one-to-one. Moreover, the transformation 
少 = 〆， with inverse x ^ maps A 2 onto 潘 = { 少 ： 0 < 少 < oo} 
and the transformation is one-to-one. Consider the probability 
Pr(K eB), where JB a ： 翎 ‘ Let 為 ={jc : 文 =— 少 e 5} c= A a and 
Jet A 4 ^= [x:x ^ sjy,y € B} A 2 . Then YeB when and only when 
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Xe A 3 or Xe A 4 . Thus we have 

Pr(reB) = Pr (JTe 為 ) + Pr(XeA 4 ) 






f(x) dx + 

In the first of these integrals, let x = — Thus the Jacobian, say ， 
is —1/2^,/y; moreover，the set A 3 is mapped onto B. In the second 
integral let x — Thus the Jacobian，say J 2y is l/2y/y; moreover, 
the set 乂 4 is also mapped onto B. Finally ， 

/• 


Pr(YeB)= /(- 々 ) 






dy + 


As/y)-~dy 

: lyjy 


\A-^y)+Ajy))^Tdy. 


Hence the p.d,f, of Y is given by 


g(y) 


2^5 

Withy(x) the Cauchy p,df. we have 


lA-^/y)+f(Vy)l yeB. 


g(y) ^ - — ， 0<y<co, 

+ fhjy 

— 0 elsewhere- 

- ， 

In the preceding discussion of a random variable of the continuous 
type，we had two inverse functions, x == —y/y and x = 士 、 That is 
why we sought to partition si (ora modification of j/) into two disjoint 
subsets such that the transformation y — x 1 maps each onto the same 
激 . Had there been three inverse functions, we would have sought to 
partition si (or a modified form of into three disjoint subsets，and 
so on. It is hoped that this detailed discussion will make the following 
paragraph easier to read. 

Let h(x u x 2 ,... 9 x n ) be the joint pAS, of A^ ， A" 2 ”… ， X n , which 
are random variables of the continuous type. Let si be the 
/ 2 -dimensional space where h(x 、， jc 2 ,. …， xj > 0， and consider the 
transformation = u l (x ] ,x 2 , • •. ， xj, y 2 ^ u 2 (x { , x 2 ,. …， x rt ), •. ” 
y n = u n {x x , ;c 2 , ■… ， at,,), which maps si onto 激 in the 少 ■ ， h，•- ” 凡 
space. To each point of there will correspond, of course, but one 
point in 劣 ； but to a point in M there may correspond more than one 
point in si. That is ， the transformation may not be one-to-one. 
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Suppose, however, that we can represent si as the union of a finite 

number, say k, of mutually disjoint sets d …，為 so that 

■ 

» ■ 

少 I = Ut(x {i x 2 , … •，: ， y n = u n (x t y ，• . ， A) 

define a one-to-one transformation of each A, onto Thus, to each 
point in 3S there will correspond exactly one point in each of 
^ * … ， Let 

少2” _ * ， JV )， 

^2 — w 2i(y\ ， 72, •. * ， y»)* 

: /= 1，2,…，灸， 

、 * • 

* 

x n = ^niiynyi ，* ^ ，，凡 )， 

denote the k groups of rt inverse functions, one group for each of these 

k transformations. Let the first partial derivatives be continuous and 
let each 



dw u 


dw u 



办 i 


Sy„ 




Sw 2f 



Ji — 

ii 

^y\ 

• 

» 

Sy 2 

• 

m 

， / 念 1 ， 2 ,… ，众 


# 

Sw ni 

• 

• 

枚 / 


i 

, Syi 

^2 




be not identically equal to zero m 象 From a consideration of the 
probability of the union of k mutually exclusive events and by applying 
the change of variable technique to the probability of each of these 
events, it can be seen that the joint p_d,f, of r,= 叫 (H … ，尤 }， 

Yl = 心 OV，AV … ， 足)， •"，}； = u n {X u X 2 ^..,X n \ is given by 

•纛 • ■* , ■ * 

■ _ „ # 

k 

g( ^ yuy ^ •••，％)'[ … ， A) ， … ， w ni {y x , _ . , y n )l 

I 5 = 1 

暑 

provided that 0 7 ! ，乃， … ， A) e 激 ， and equals zero elsewhere. The 
p.d.f. of any Y h say Y u is then 

.Y*oo /*oo 

• ^ * 

Si(yi)^ I 1 Siyuy 2 ,^^y n )dy 2 *-dy n , 

. . * T'CO ^ — oo ‘ 

An illustrative example follows. 
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Example Z Tq illustrate the result just obtained, take « = 2 and let s X 2 

denote a random sample of size 2 from a standard normal distribution. The 
joint p.d.f. of X t and X 2 is 


办， x 2 ) = ^ ex P (-^^)， 


—GO < 々 < 00, — 00 < JC 2 < 00 


Let 6 denote the mean and let Y 2 denote twice the variance of the random 
sample. The associated transformation is 


Ji 


Vi 


x y +x 2 

2 ， 

- 

~ T~ 


This transformation maps W = {(% ，々）： —oo <oo 5 -oo <x 2 < oo} onto 

激 = {( 少 ! ， h) : — oo < % < QO, 0 < 乃 < oo}. But the transformation is not 
one-to-one because, to each point in 激 , exclusive of points where = 0, there 
correspond two points in s/. In fact, the two groups of inverse functions are 


A ^y { 



x i—yi + 



and 


Xi = yi + 



x i ~ Y\ 



Moreover, the set ^ cannot be represented as the union of two disjoint sets ， 
each of which under out transformation maps onto 激 * Our difficulty is caused 
by those points of j/ that lie on the line whose equation is jc 2 = • At each 

of these points, we have y 2 = 0. However, we can define7(jc h x 2 ) to be zero 
at each point where x { = x 2 . We can do this without altering the distribution 
of probability, because the probability measure of this set is zero. Thus 
we have a new ^ = {(x u x 2 ): -co < < co^oo <x 2 < oo, but ^ x 2 }. 

This space is the union of the two disjoint sets A x = {(x, ， jc 2 ) : x 2 >x l } 
and {(x u x 2 ) ： x 2 < Moreover, our transformation now defines 
a one-to-one transformation of each A h i — 1,2, onto the new = 
{Ch ， 少 2 ) : 一 < 乃 < oo, 0 < 乃 < oo}_ We can now find the joint p,d ， f” say 
giyuyiX of the mean Y i and twice the variance Y 2 of our random sample. 
An easy computation shows that |/,| = \ J 2 \ = Xj^ly^ Thus 

(yi- Jy^f • (yi + 


g(y i, y 2 ) ^ 


2 


2 




+ 


2^ exp 


VyJ^) 2 (yi- vS/ 2 ) 2 ' 




2n 


€ 




V2r0 


y^ 2 ~ - yz/2. 


oo<y t < oo 5 0 < j^ 2 < oo. 
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We can make three interesting observations. The mean Y t of our random 
sample is N(0, \)\ Y 2 , which is twice the variance of our sample, is ^ 2 (1); and 
the two are independent. Thus the mean and the variance of our sample are 
independent. 


EXERCISES 


4.48. Let m denote a random sample from a standard normal 
distribution. Let the random variables Y 2 , Y y be defined by 

X\ — Y { cos Y 2 sin K 3 , X 2 — Y x sin Y 2 sin Xj = Y x cos 

where 0 < < oo, 0 < < 2 te, 0 < K 3 < k. Show that F,, V 2 , are 

mutually independent, 

4.49. Let be LLd., each with the distribution having p.dX 

f{x) — e™' 0 < jc < oo, zero elsewhere. Show that 



A 

^ + X 2 5 


2 一不 + JT 2 + JT 3 ’ 


are mutually independent* 


— X t + X 2 + Xj 


4-50. Let X i% X 2 ^.. be r independent gamma variables with pa¬ 
rameters a = oc, and 於 = 1， / = 1, 2, . “ ， r, respectively. Show that Y x = 
+ X 2 + * — h X r has a gamma distribution with parameters = 
A + - h a r and fi = L 

Him ： Let h - = X 2 + • • • + JT r , X 3 +^ ^ + X ri ., Y r ^ X r . 


4*51* Let Fj, -.., Y k have a Dirichlet distribution with parameters 

Of I ， • * • ， OC* ， 0^ 4 j • 

(a) Show that Y x has a beta distribution with parameters a = a ( and 

办 = + … + a A + 卜 

(b) Show that ^ * + Y t ^r < has a beta distribution with parameters 

a = a, + * — + oc r and 办 =H - + 々 + _• 

(c) Show that + Y 2 , K 3 + Y 4r Y s , •… ， Y k ， k>S, have a Dirichlet 
distribution with parameters a, + a 2 , a 3 + a A , a 5 , “ • ， a A . ， a k + l . 

Hint: Recall the definition of Y ( in Example 1 and use the fact that 

the sum of several independent gamma variables with ^ = 1 is a gamma 

variable (Exercise 4, 50). 

4.52* Let 1"^ X 2i and X l be three independent chi-square variables with r u r 2 * 

and r 3 degrees of freedom, respectively* 

辨 I 

(a) Show that — X x jX 2 and Y 2 — + X 2 are independent and that Y 2 

is x\r x + r 2 ). 
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X 也 

(^\ + D/( r l + 广2 ) 


4_53. lff(x) = 士， 一 I < jc < J, zero elsewhere, is the p.d.f. of the random 
variable X 9 find the p.dX of K = X 2 . 

4.54* If is a random sample from a standard normal distribution ， 

find the joint p.dX of Yi = X] + Xj and Yi — X 2 and the marginal p.d.f. 
ofF” - 

Hi nt : Note that the space of K t and Y 2 is given by — y/y x <yi< 

0 <^ t < oo. 

4_5S* If X has the p.df./(x) = 去 ， 一1 < x <3, zero elsewhere, find the p.cLf 
of Y^X 2 . F 

Him: Here 3S — {j;: 0 < j; < 9} and the event Fe 5 is the union of two 
mutually exclusive events if 5 = {^ ： 0 < j < I }. 

4 6 Distribufionsof „S!^eL^^j^i^f 

In thi.s section the an order statistic will be defined and 

we shall investigate some of thejimplcr properties of such a statistic* 

^These statistics have in recent times come to play an important role 

in statistical inference partly because some of their properties do 

not depend upon the distribution from which the random sample is 
obtained* 

Let X 2 ,... ,X n denote a random sample from a distribution of 
the continuous type having a p.dS,J{x) that is positive, provided that 
a< x <b. Let Y } be the smallest of these Y 2 the next A "； in order 
of magnitude,. •. ， and 1 ； the largest X h That is ， Y x <Y 2 <^-<Y n 
represent X u X 2f .. X n when the latter are arranged in ascending 
order of magnitude. Then O 1 ， 2 ,…，％ is called the ith order 
statistic of the random sample I,, X 2 ” • ■, Ut will be shown that the 
joint p.dX of Y u Y 2i . , *, Y n is given by 

s(y\^yi, … ■，凡 ） =(«!)/ (少 |)/( 少 2 ) . • •/(%)， 

a <y\ <yi< f - <y n <b f 
= 0 elsewhere. ⑴ 

We shall prove this only for the case w = 3, but the argument is seen 
to be entirely general. With n = X the joint p*d,[ of 不， JiT 2 , X 3 is 


(b) Deduce that 




and 


are independent 尸 -variables* 
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f(xi)f{x 2 )J{x^). Consider a probability such as Pr (a < — X 2 < b, 

a < X 3 < b). This probability is given by 

\ A^\ )A^ 2 )A^) dx } dx 2 dx 3 = 0, 

J X2 


since 


*JC 2 

办 j) 办 I 


is defined in calculus to be zero. As has been pointed out, we may, 
without altering the distribution of X 2 ^X^ define the joint 
p.dX to be zero at all points (x^ x 2 , jc 3 ) that have 

at least two of their coordinates equal Then the set j/ ，where 
f(xi)f(xi)f(x 3 ) > 0, is the union of the six mutually disjoint sets: 


A = {(I ■ ，叉 2 , A ) : a<x } <x 2 <x 3 < b}, 
A 2 = {(%i ， x 2 , x{):a < x 2 < Xi < x y < b}, 
^3 = {(4, A ， x 3 ):a < x { <x 3 <x 2 < b]. 


^4 - {(Xi,x 2 ,x 2 ) :a<x 2 <x 3 <x l < b), 

_ K 

A s — {(x u x 29 x 3 ) : a < x 3 < Xi < x 2 < b\, 

■ # 1 * * 

， " i - 

A e = {(x u x 2f x 3 ): a < <x 2 <x } < b). 

There are six of these sets because we can arrange x u x 2 ^x z in 
precisely 3! — 6 ways. Consider the functions = minimum of 
XijX 2 , x 3 ; y 2 — middle in magnitude of x t , x 2 , x 3 ; and y 3 = maximum 
of x u x l9 These functions define one-to-one transformations 
that map each of A u A 2f ^ A 6 onto the same set M — {(>W2, J3): 
<y\ <yi<y%< b}. The inverse functions are，for points in A u 
= Ji，A = 少 2 , A = 少 3 ; for points in A 2 ^ they are x, = y 2 , x 2 
Xy = y 3 ; and so on, for each of the remaining four sets. Then we have 
that 
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It is easily verified that the absolute value of each of the 3! = 6 
Jacobians is +1* Thus the joint p.d.f. of: the three order statistics 
minimum of X 2 , X 3 ： Y 2 — middle in magnitude of X 2 , 

F 3 = maximum of X 2 , X 3 is 

茗 ( 少■，少 2 , 乃卜 \Ji\AydAy2)fiy i ) + \J 2 \ Ay 2 )Ay i )Ayy) + … 

+ \^6\AyMyi)f(yi\ ^<y\<yi<y^<K 
~ (3!)/(yj )f(yi)f{y^ a <y { <y 2 < y 3 < b, 

= 0 elsewhere. 

This is Equation (I) with n — 3 m 

In accordance with the natural extension of Theorem 1, Section 2.4, 
to distributions of more than two random variables, it is seen that the 
order statistics, unlike the items of the random sample, are dependent. 

Example /, Let X denote a random variable of the continuous type with 
a p.dX fix) that is positive and continuous, provided that a < x < b and 
is zero elsewhere. The distribution function F\x) of X may be written 

*jr 

H x ) ^ /( 叩）咖， a <x <b. 


If x < a, JF{x) — 0 ; and if ft < x, /^x) = I* Thus there is a unique median m 
of the distribution with JF{m) = Let X x ^ X 2 ^ Xy denote a random sample from 
this distribution and let Y } < Y 2 < Y 3 denote the order statistics of the sample. 
We shall compute the probability that Y 2 < m. The joint p.dX of the three 
order statistics is 

* - . ■ 祕 


giyuyj.y^) = a <y t <y 2 <y 3 <b, 

= 0 elsewhere. ^ 


The p,d.f. of Y 2 is then 


Kyi) ^ (yfiyi) 




n 




ryi 


/() I )/0^3 ) 咖办 3, 


a 


= 明少 2 )/l[；； 2 )[] — F(y 2 )l a <y 2 <b. 


0 elsewhere. 


Accordingly ， 


Pr(K 2 <m) = 6 


i^Ayi) - dyi 

施 )] 2 [f{y2)]T i 

... 

一 V 


a 
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The procedure used in Example I can be used to obtain general 
formulas for the marginal probability density functions of the order 
statistics. We shall do this now. Let JT denote a random variable of the 
continuous type having a p.cLf ‘ f(x) that is positive and continuous, 
provided that a < x < b 7 and is zero elsewhere* Then the distribution 
function F(x) may be written 

F{x) = 0 ， x < a, 

= f(w) dw, a < x <b^ 


= I, h < x. 


Accordingly, F f {x) a < x < b. Moreover, if a < x < b, 

卜 F(x) = F(b) - F(x) 

ft 

= f(w) dw — f(w) dw 

f^b 

=| J{w) dw. 

Let Xi, X 2 ,., X n denote a random sample of size n from this 
distribution，and let Y u y 2 , • •. ， L denote the order statistics of this 
random sample. Then the joint p-d.f. of Fj ， K 2 , … ， Y n is 

g(yi, yi . y n ) = # - • fM，^<yi <y 2 < … < 凡 <*， 


0 elsewhere. 

It will first be shown how the marginal p,d,f. of Y n may be expressed 
in terms of the distribution function F(x) and the p.d*f. f(x) of the 
random variable X. I f a < y n < fr, the marginal of y n is given by 




i % 


yn 




a 


^yn 






少 4 


r 少 3 


ryi 


a 


V 


^Ay\)f{yi) * - m f(yn) d y\ dy 1 dy l … dy n 


a 




a 


ry4 




m 


w 


a 




yi 





Ay\)dy x ifiyi ) # - •/(%) 办 2 …也 






yi 


n\ F{y 2 )f{y 2 ) - * dy 2 ' * * dy n 


a 
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since F(x) = ^ f(w) dw. Now 


W ^ 2 )/^ 2 ) dy 2 - l ^ yi)f 




2 


[聊 

2 


yi 


a 


2 


since F(a) — 0* Thus 


ry^ 


gAyJ 




ry4 


nl 




阿 J 3)] 2 

2 


yih) …•/( 凡 ) dy^- dy n 


But 






2-3 


少 4 


my 4 )V 


a 


so 


SniXn) = 




r w! E^3 




~ /DO … / W 办 4 …也 -1 




y n ^ x are carried out，it is seen 


If the successive integrations ony 4 
that 

gM = d; ■ 办 ) 

= n[F(y n )T ' x f{y n \ a <y„<b, 

— 0 elsewhere. 

It will next be shown how to express the marginal p,d,f. of Y { in 
terms of F{x) and fix). We have, for a <y x < b, 

f*b /*b /*b 

gi(yi) 


今】 




nlf(yi)Ayi)- •fM dy„ dy f 


dy 2 


〜一 2 


J y\ . 




^Ayx)Ayi)^ 





m-F(yn-i)]dy n 


办 2 * 
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But 




[1— 取“賴少卜⑽卜,^ — 


m 

2 




yn- 


[i-U] 

2 


2 


so that 


g}(y\) = 


rb 


J y\ 


fb 


物 i). • n) 


W-2 … 办 2 


-3 


Upon completing the integrations, it is found that 




0 elsewhere. 


Once it is observed that 

产 X 


[F(w)] a ^ l J{w) dw 


imr 


a 


a > 0 


and that 


冲 . 


[1 — FOXP - l f(w) dw 


[1 — F(y)f 






it is easy to express the marginal p.d,f. of any order statistic, say Y k , 
in terms of F(x) and f(x). This is done by evaluating the integral 


gk(yk) 


f a 


^ J yk 



^f(yi)f(y2) ^f(yn) dy n 


dy k+ \dy x … dy k - 


Tlie result is 


gkiyk) 


n\ 


(k — I)! (n — k)\ 


阪 ) r_ ， [i - 他 ) ri). 


:， a < y k < b, 

= 0 elsewhere* (2) 

Example 2. Let Y } < Y 2 < Yj < Y 4 denote the order statistics of a random 
sample of size 4 from a distribution having p.d-f. 

， f(x) — 2x y 0 < x < 

‘ r 

— 0 elsewhere. 
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We shall express the p.d.f- of Y 2 in terms o(f(x) and F{x) and then compute 
Pr < Fj), Here = x 2 y provided that 0 < x < I, so that 

gAyi ) = ^ (AfO - 少!)(2乃)， 0 < 少 3 < 1， 

= 0 elsewhere- 


Thus 


Pr(|< K 3 )- 


gAy%) dy % 


J )I2 


M 


1/2 


24(^ — y]) dy^ = 


Finally, the joint pAS. of any two order statistics, say Y f < Yp is 
as easily expressed in terms of F(x) and f(x). We have 


gijiyh yj) 


ryi 




ryi 


ryj 


m 


f*h 


f{yn) dy n 




n\f(yi) 


y 卜 2 ，』 J yn^ 

■ • .dy J+ 、dyH 


dy^ x dy } ^ 



Since，for y > 0 ， 

r IHv) - F(w)V y 

[F(y) - F(w)f^f(w)dw= - l W y "丄 

\ x 

、 _ — : 

= --， ■ 

/ . . . 7 ■ 

■ ■ ■ 

it is found that 

, \ w! - 

guiyn y；) = (，_!)! (7-/- i)!(«-y-)! 

x E^)]-- i[F ( 乃） 一 - Fly^Y^fiydfiyj) (3) 

for a <y t < y ; < b, and zero elsewhere. 


Remark, There is an easy method of remembering a p.dX like that given 
in Formula (3), The probability Pr {y t < Y f < y f + A；, y s < Yj < y ； + A y ), 
where A, and A ; are small, can be approximated by the following multinomial 
probability* In n independent trials, i — I outcomes must be less than 乃 
(an event that has probability p } — F(yf) on each trial); j — / — I outcomes 
must be between + A, and y i [an event with approximate probability 
p 2 ― F(yj) — F(yi) on each trial]; n — j outcomes must be greater than yj + 
(an event with approximate probability /? 3 = I — F( 乃 ）on each trial); one 
outcome must be between and + A ； (an event with approximate 
probability p 4 — f{y t ) on each trial);, and finally one outcome must be 
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between 乃 :and 乃 + Ay [an event with approximate probability p 5 =J{y )A 
on each trial]. This multinomial probability is J 


_ nl __ y 

(/ — 1)! (/ —i — 1)! {n—j)\ 1! I! 


'pt 卜 1 pUs ， 


which is g^y^Aj. 

Certain functions of the order statistics Y u K 2 ,..., Y n are 
important statistics themselves, A few of these are: (a) Y n - y,, which 
is called the range of the random sample; (b) (Y, + Y”)j2, which is 
called the midrange of the random sample; and ⑻ if « odd ， Y { ^ m , 
which is called the median of the random sample, 

Example 3. Let Y u y 2 , Y 3 be the order statistics of a random sample of 
size 3 from a distribution having p,df 

f(x) — 1* 0 < x < 


= 0 elsewhere. 

We seek the p d.f, of the sample range Z, = Y 3 - Y x . Since F{x) = x 
0<x< 1 , the joint p*d.£ of Y % and r 3 is ’ 

贫 1?(少|，少 3) = 6(/3 - yi \ 0 < 乃 < 少 3 < 1, 


= 0 elsewhere. 

In addition to Z x — Y^^ Y u let Z 2 = Consider the functions z^—y^—y^ 

z i an d their inverses ~ ^ ^ 2 2j so that the corresponding 

Jacobian of the one-to-one transformation is 


dzi 

I 瓦 


8z 2 

dz 2 


0 


Thus the joint p,d.f_ ofZ r and Z 2 


is 


= 卜 1|6 Z | = 0 < 2, < 2 2 < 1. 

.= 0 elsewhere* 

Accordingly, the p.df. of the range Z t = Y y — Y x of the random sample of 
size 3 is 

a 

■' . • ■ .k * 

M ， 

Ap(z r ) = 6z, dz 2 = 6z,(I — z,), 0 < z t < !, 


— 0 elsewhere 
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EXERCISES 

_ V * * 

4.56. Let F, < Y 2 < Y s < Y 4 be the order statistics of a random sample of size 

4 from the distribution having pA.t/{x) = 0 < jc < oo, zero elsewhere, 

Find Pr (3 < K 4 ). 

4.57. Let be a random sample from a distribution of the continuous 

type having p + d.f. J{x) — 2jc, 0 < jc < I, zero elsewhere, 

(a) Compute the probability that the smallest of these X t exceeds the 
median of the distribution, 

■* • 

(b) If Kj < Y 2 < Yy are the order statistics, find the correlation between Y 2 
and . 

458. Let J{x) = } h x — 1,2, 3, 4, 5, 6 ? zero elsewhere/ be the of a 
distribution of the discrete type. Show that the pA.L of the smallest 
observation of a random sample of size 5 from this distribution is 




zero elsewhere. Note that in this exercise the random sample is from a 
distribution of the discrete type. All formulas in the text were derived under 
the assumption that the random sample is from a distribution of the 
continuous type and are not applicable. Why? 

4.59. Let < y 2 < ^3 < ^4 < denote the order statistics of a random 
sample of size 5 from a distribution having p.d.LJ{x) = 0 < x < oo, 

zero elsewhere. Show that Z t = Y 2 and Z 2 — Y 2 are independent* 
Hint: First find the joint of Y 2 and Y 4 . 

c 

4,60* Let Yi < Y 2 < * m m < Y n be the order statistics of a random sample of 
size n from a distribution with pAS, f{x) = 1， 0 < x < 1 ， zero elsewhere. 
Show that the kth order statistic Y k has a beta p,d,f，with parameters a — k 
and n — k + K 


4.6L Let h < 7 2 < … < h be the order statistics from a Weibull 
distribution, Exercise 3.44, Section 3,3, Find the distribution function and 
p-d-f. of Kj. 

4.62. Find the probability that the range of a random sample of size 4 
from the uniform distribution having the p,dX y(x) = 1, 0 < x < i, zero 
elsewhere, is less than • 


4.63 - Let Y t < Y 2 < be the order statistics of a random sample of size 3 
from a distribution having the p.dXyfx) — 2x^ 0 < x < I, zero elsewhere. 
Show that Z, = YJY 2 , Z 2 = F 2 /y 3 ,andZ 3 = Y } are mutually independent. 
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4,64， If a random sample of size 2 is taken from a distribution having p.d.f. 
A x ) = 2(1 — x), 0 < x < I, zero elsewhere, compute the probability that 
one sample observation is at least twice as large as the other, 

4.65. Let F, < Y 2 < Y } denote the order statistics of a random sample of size 
3 from a distribution with pAJ,J(x) = 1， 0 < jc < 1, zero elsewhere. Let 
Z — (7, + Yy)j2 be the midrange of the sample. Find the p*d,f. of Z. 

4-66 - Let Y } < Yi denote the order statistics of a random sample of size 2 
from ^(0, a 2 \ 

(a) Show that E{Y X ) — — cr/y/n. 

Hint: Evaluate £(7,) by using the joint of F, and Y 2 , and 
first integrating on y { . 

(b) Find the covariance of Y y and y 2 . 

4,67. Let < Yi be the order statistics of a random sample of size 2 
from a distribution of the continuous type which has p*d.f such that 
A x ) > provided that x > 0, and f{x) = 0 elsewhere. Show that the 
independence of Z, = Y t and Z 2 = Y 2 — Y x characterizes the gamma p.dX 
A x X which has parameters a = 1 and 沒 > 0, 

Hint: Use the change-of-variable technique to find the joint p.d.f, of 
Z, and Z 2 from that of Y { and Y 2 . Accept the fact that the functional 
equation h(0)f^x + y) = h{x)k{y) has the solution h(x) — c'e Cl \ where c { 
and q are constants. 

■ ， . ■ ■ . , ■ * 

. k S 

4.68* Let K, < Y 2 < < Y 4 be the order statistics of a random sample of size 

n = 4 from a distribution with ^=2x,0 < x < L 

(a) Find the joint p.d.f. of and F 4 , 

(b) Find the conditional p.dX of given Y 4 — y 4 . 

(c) Evaluate E(Y s \y 4 ). 

4.69. Two numbers are selected at random from the interval (0, 1), If these 
values are uniformly and independently distributed, compute the prob- 
ability that the three resulting line segments, by cutting the interval at the 

numbers, can form a triangle, 

>. ¥ ( * * ■- . 

* ■丨， 嚅 , * -a 

4*70. Let X and Y denote independent random variables with respec¬ 
tive probability density functions J[x) — 2x, 0 < x < I, zero elsewhere, 
and g(y) = 3y 2 , 0 < y < 1, zero elsewhere. Let U — min (X, Y) and V— 
max(A" T Y). Find the joint pAS. of U and K 

Hint: Here the two inverse transformations are given hy x = u % y = v 
and x — y ^ u, * . _ 

I； 

4,71. Let the joint p*dX of X and Y be f{x, y) = yx(x + y), 0 <x < 1, 

0 <y < 1, zero elsewhere. Let U = min (X, Y) and max (X, Y). Find 
the joint pAS. of U and K 
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4*72. Let X u Jf 2 ,.. -, be a random sample from a distribution of either 

+ • ^ * ■ 

type* A measure of spread is Ginfs mean difference 



(a) If n — 10, find a u a ly . *., a w so that G — ^ a ( Y h where 

i 

Y u Y 2f ,.. ， are the order statistics of the sample. 

(b) Show that E(G) — 2a/^/n if the sample arises from the normal 
distribution MQu， a 2 ), 

4.73, Let Y t < y 2 < … < Y n be the order statistics of a random sample of 
size n from the exponential distribution with p.df, 办）— e~\ 0 < x < oo, 
zero elsewhere. 


(a) Show that Z, = nY u Z 2 = (n -Z 3 - (n - 2) (Y % - 


..^Z„— Y n ― are independent and that each Z f has the 
exponential distribution. 


(b) Demonstrate that all linear functions of Y u y 2 , .■” such as [ a t Y h 

■ . t 

can be expressed as linear functions of independent random variables. 

4-74* In the Program Evaluation and Review Technique (PERT), we are 
interested ie the total time to complete a project that is comprised of 
a large number of subprojects. For illustration, let X u X 2 ^ be three 
independent random times for three subprojects. If these subprojects are 
in series (the first one must be completed before the second starts，etc.)， 

then we are interested in the sum Y = + X 2 + Xy, If these are in 

% 

parallel (can be worked on simultaneously), then we are interested in 
Z — max X 2 ^ In the case each of these random variables has the 
uniform distribution with p*d.f, J{x) = 1 $ 0 < jc < I, zero elsewhere, find 

(a) the p,d*f, of Y and (b) the p,d.f of Z. 

* ^ % 

4.7 The Moment-Generating-Function Technique 

The change-of-variable procedure has been seen, in certain cases, 
to be an effective method of finding the distribution of a function of 
several random variables. An alternative procedure, built around the 
concept of the m.gX of a distribution, will be presented in this section. 
This procedure is particularly effective in certain instances. We should 
recall that an m.gX，when it exists，is unique and that it uniquely 
determines the distribution of probability. 

Let h(x u x 2 , •••，&) denote the joint p,dX of the n random 
variables U， …，足. These random variables may or may not>e 
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the observations of a random sample from some distribution that has 
a given p.dX f{x). Let Y { = u x (X u X 2 ^ X n ), We seek g(yt) y the 
p.dX of the random variable F 卜 Consider the m.g.f. of If it exists, 
it is given by 


卿) = E(e t¥i ) = e ty[ g(y y ) dy x 

^ ~ao 

in the continuous case. It would seem that we need to know g{y t ) before 
we can compute M{t), That this is not the case is a fundamental fact. 
To see this consider 


产 DO 



exp [tu x (x,, 



-…， a) dxr- dx n , (1) 


which we assume to exist for ~h < t <h. We shall introduce n 


new variables of integration* They are . •. ， x n ) f …， 

yn — w rt (j£Tb . .xj. Momentarily, we assume that these func¬ 
tions define a one-to-one transformation* Let ^ = %( 乃，少 2 ; ，…， y n ), 
i = 1 ，2,…，《， denote the inverse functions and let / denote the 


Jacobian. Under this transformation^ display (I) becomes 


e tyx \ J \ h {^\^ …， w ) 办 2 …办 „ 咖. 


( 2 ) 


00 


In accordance with Section 4,5, 


I JIMwvCk! ，乃， •.. ， y n X • • • ， w /t (y l ,y 2 , • •, ， y n )] 

is the joint p,dX of Y 2 , "” The marginal p.d.f, ^|) of Y x 
is obtained by integrating this joint pAS. on y 2 ”，•，Since the 
factor 〆少 1 does not involve the variables y 2 ，•… ，凡， display (2) may 
be written as 


e m g(yi)dy { . (3) 

:. ^ —oo 

But this is by definition the M{i) of the distribution of Y x * 

That is, we can compute E{exp [tu t (X { ,, ■ • ， X n )]} and have the value 
of £(€ (¥] ), where Y x = , X n ). This fact provides another 

technique to help us find the p.d.f of a function of several random 
variables. For if the m.g.f. of Yi is seen to be that of a certain kind of 
distribution, the uniqueness property makes it certain that Y { has that 
kind of distribution. When the pAS. of Y\ is obtained in this manner, 
we say that we use the moment-gmerating-function technique. 












Sec, 4J\ The Moment^Generating^Function Technique 


205 


The reader will observe that we have assumed the transformation 
to be one-to-one. We did this for simplicity of presentation. If the 
transformation is not one-to-one, let 

~ ^jiiy I ，•… ，少 n )， J ^ 1 ， 2 ， • . _ ， W，/ = 1 ， 2 ， * “ ， 灸， 

denote the k groups of w inverse functions each. Let J h i = 1 ， 2, •. • ， A :， 
denote the k Jacobians, Then 

k , 

Z \ J i\h[^\i{y\ ，■… ，凡 X w ni {y x ” . •，凡 )] (4) 

f & 羅 

is the joint p.dX of Y u 9 Y n . Then display (I) becomes display (2) 
with \J\h(w l7 • " ， wv) replaced by display (4). Hence our result is valid 
if the transformation is not one-to-one* It seems evident that we can 
treat the discrete case in an analogous manner with the same result. 

It should be noted that the expectation of Y } can be computed in 
like manner. That is, 

f*00 

E(Y { ) ^ y\s(yi) <^y\ 




^n)K x \ 


A) dx { - dx n . 


and this fact has been mentioned earlier in the book. Moreover, this 
holds for the expectation of any function of Y if say w(Y } ); that is, 




^(y\)g{y}) dy\ 


咖 


X n )]h(x { ， … ， A) 和 … dx n . 


We shall now give some examples and prove some theorems where 
we use the moment-genera ting-function technique. In the first example, 
to emphasize the nature of the problem, we find the distribution of a 
rather simple statistic both by a direct probabilistic argument and by 
the moment-generating-function technique. 

Example h Let the independent random variables X x and X 2 have the 
same p.dX 


A^) 


x 
6 s 


x 


，2,3 


= 0 elsewhere: 
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so the joint p,d,f of X\ and X 2 is 

% I « 

)/(^ 2 ) ~ 36 2 ， A = 1 ， 2, 3 ， a = 1 ， 2, 3 ， 了 
; 0 elsewhere. 

A probability, such as Pr (X\ = 2, X 2 — 3 ), pan be seen immediately to be 
(2)(3)/36 = i. However, consider a probability such as Pr (X } + = 3)，The 

computation can be made by first observing that the event A", + = 3 is the 

union，exclusive of the events with probability zero, of the two mutually 
exclusive events (X } ^ 1 , jr 2 ^ 2) and (X x = 2, J!T 2 = i). Thus 

Pr (X } + - 3) = Pr (JTj = 1,^-2) + Pr (X } = 2, JT 2 = I) 

0)(2) , (2)(1) 4 

— + 一 元， 

. i . 

More generally, let y represent any of the numbers 2, 3, 4 ’ 5 , 6 . The probability 
of each of the events Xi + X 2 — y 7 y = 2,3,4, 5, 6 , can be computed as in the 
case y — 3, Let g(y) = Pr (X { + X 2 — y). Then the table 


y 

2 

3 

4 

5 

6 

g(y) 

] 

36 

4 

36 

m 

36 

\2 

36 

9 

36 


gives the values ofg(y) for y = 2, 3, 4, 5,6, For all other values of y, g{y) = 0, 
What we have actually done is to define a new random variable Y by 
Y ^ Xi + X 2 , and we have found the p.df, g(y) of this random variable Y, 
We shall now solve the same problem，and by the moment-generating-func- 
tion technique. 

Now the m.g.f, of Y is 

— E{e tXl )E(e tXl ), 

since and X 2 are independent- In this example X\ and X 2 have the same 
distribution, so they have the same m.gi; that is, 

E(e r ^ 1 ) = E(e f ^ 2 } — \e l + \e 2t + 

Thus , • * 

^(t) = + le 21 + |^ 3r ) 2 

-- _ * a 

一 « f 

, = + + i e4r + + ， 

This form of M(t) tells us immediately that the p.d.f. g(y) of Y is zero except 
at 3,4,5,6, and that g(y) assumes the values 去 ， 4 發 ， 盖， 
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respectively, at these points where g(y) > 0. This is，of course, the same 
result that was obtained in the first solution. There appears here to be little, 
if any，preference for one solution over the other. But in more complicated 
situations, and particularly with random variables of the continuous type，the 
moment-genera ting-function technique can prove very powerful. 

Example 2 Let and X 2 be independent with normal distributions 
and N(ji 2 , respectively. Define the random variable Y by 
Y = — X 2 . The problem is to find g(y)j the p.dX of K. This will be done 

by first finding the of Y* It is 


M(t) = £(e^ - X2) ) 

• = E(e tX} e~ 1 ^ 2 ) 

- E(e tX ^)E(e^ X2 l 
since X x and X 2 are independent. It is known that 


E(e rXt ) = exp (f^\t + 




and that 


E(e ,Xi ) = exp (fi 2 1 + 


for all real /. Then E(e~ rX2 ) can be obtained from E(e tX2 ) by replacing tby 
That is, 

E(e~ iXl ) — exp (—fi 2 f + 


Finally, then. 


M{t) — exp 


(/v + 


T-) ex p(~^ / + T") 


( f v (<r 2 i + 

=exp - fi 2 )t^ - 2 -少 

The distribution of Y is completely determined by its m.g.f- M(t), and it is seen 
that K has the p.d.f. g(y)^ which is AT(/x, — ft 2 , <j] + - That is, the difference 

between two independent, normally distributed, random variables is itself a 
random variable which is normally distributed with mean equal to the 
difference of the means (in the order indicated) and the variance equal to the 
sum of the variances. ’ 

The following theorem, which is a generalization of Example 2, is 
very important in distribution theory. 
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1. Le? H ‘ • • ， X n be independent random variables 
havin S, respectively, the normal distributions N{ji u a}\ N(fi 2 , 

and The random variable Y = k^x + k 2 X 2 +_+ )c n X tt ] 

where k u k 2y … ， k n are real constants、is normally distributed with 

k \fM + …乂 k nf i n and variance k]a] + • ■ . + That is，Y is 


Proof Because X u X 2 
given by 


K are independent, the m.gS. of Kis 


M(/) = E{cxp [t(k } X x + k 2 X 2 + 




’ • + KK)}} 

E{e tklJCi 、 E(e tk2Xl ) … E(e tknXn ). 


Now 


E(e, 


exp 



+ 




for all real £， / = 1， 2” . ” 反 Hence we have 


E{e lk ^) = e 

That is, the iiLg-f, of r is 


xp 


从彻 + 


W(M 2 

~2~ 


-J 


^(t) ^ n ex p 


Qcfof)t 

2 


exp 


it 


y 




But this is the m.g.f of a distribution that is N (免 


This is the desired result 


The next theorem is a generalization of Theorem I. 

Theorem 2* If X^X 2r are independent random variables 

Wlth respective moment-generating functions i ^ I 2, 3 . 




■.ilpll 


















Sec, 4+7| The Moment-Generaiing-Function Technique 


209 


then the moment-generating function of 


r = [ 

. i ~ I 

where a t ， a 2 , … ， a k are real constants, is 

昏 

M r (t) = Yl MOV). 

I 


Proof. The of Y is given by 

M y {t) = E[e tV ] = E[e^ a ^ + ^2^2 十 … + 

—— - ^^2 * * • 

=E[e a]tXi ]E[e a2rX2 ] * * ^ E[e ajXn ] 
because Zt, Jf 2 ,..., X n are independent; However, since 

E( tXi ) = M f (tl 

then 

E(f ftr 0 - M ㈣ • 

Thus we have that 


A/ r (/) = M [ (a ] t)M 2 (a 2 t) - - - M n (a n t) 

n 

= n m 人填 

A corollary follows immediately, and it will be used in some 
important examples. 

Corollary. If X u ^ ^ observations of a random sample 

from a distribution with moment ^generating function M{t), then 

n 

(a) The moment-generating function of Y = Y, ^ is 


M y {t) = n MO = [M(t)Y； 

/= I 

_ n 

(b) The moment-generating function of X = Y, is 

I = 1 

_=nO+0: fl . 

Proof, For (a), let a, = 1， / = 1，2, ■. * ， in Theorem 2. For (b), 
take a, = 1 /«， / = I ， 2, • •. ， /i. 
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The following examples and the exercises give some important appli¬ 
cations of Theorem 2 and its corollary. 

Example 3. IM 不， Z 2 , ■ … ， denote the outcomes on n Beraoulli trials. 
T'lio of = 1 ， 2， • • • ， it，is 

M(t) = 1 — /? + pe\ 

If Y Y, then 

i ^ 

_ 

M y (t) —0(* + P e< ) = 0 _f+ pe% 

Thus we again see that 7 is b(n, p\ 

_ , V . 

Example 4. Let X { ,X 2 , be the observations of a random sample of size 

rt = 3 from the exponential distribution having mean and，of course, 

M(t) = 1/(1 — fit) t t < 1/ 尽 The m.gX of Y= X } + X 2 + Xy is 

_=[(i - 和 r t - (1 一 ptr\ t < m 

which is that of a gamma distribution with parameters a — 3 and f Thus Y 
has this distribution. On the other hand，the m.g.f. of X is 

^(o=[(i^)']=(i-^ t < m 

and hence the distribution of X is gamma with parameters a = 3 and 存 /3， 

^ respectively, 

% . 

Vr- 

The next example is so important that we state it as a theorem. 

Theorem 3. Let X 2 ,.. ^ X n be independent variables that have ， 
respectively^ the chi-square distributions ^ 2 (n)» X\ r iX • … ，伽 rf JC 2 (r„)- 
Then the random variable Y = X { + X 2 + — ^ + X,, has a chi-square 
distribution with r f + — * + degrees of freedom; that is，Y is 

X 2 (r } + … + O. 

Proof. Since 

M t (t) = E{e tXi ) = (1 一 2t)- Fii \ t<\, / = !， 2, …，％ 
we have, using Theorem 2 with a】= … == 1, 

A/(/)«(I — 切 + …+ 从 2 ， / <i 

But this is the m,g.f. of a distribution that is x 2 ( r \ ++ … ■ + &)• 
Accordingly, Y has this chi-square distribution. 

Next, let X u X 2 , ^ ^ X n be a random sample of size n from 
a distribution that is Niji 、 <r 2 )* In accordance with Theorem 2 of 
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Section 3,4, each of the random variables (X f — (if jo 2 , i = 1 ， 2,… 
is x 2 (l)* Moreover, these n random variables are independent* 

n 

Accordingly, by Theorem 3, the random variable r =[ [(足 一 

is x 2 ( n )* This proves the following theorem, * 

_ 

Theorem 4. Let X u X 2f , •,, denote a random sample of size n 
from a distribution that is Nijx ， a 2 ). The random variable 



has a chi-square distribution with n degrees of freedom, 

p , * * *" * 

■ ■ a _ 4 

Not always do we sample from a distribution of one random 
variable. Let the random variables X and Y have the joint p.dX 
J{x,y) and let the 2n random variables (U), (X 2 , Y 2 ), … ， (X„, Y n ) 
have the joint p.d.f 

The n random pairs (X { , K t ), {X 2 、 K 2 )” _., {X n , Y n ) are then inde¬ 
pendent and are said to constitute a random sample of size n from the 
distribution of A" and Y. In the next paragraph we shall take/(^ y) to 
be the normal bivariate p.dX, and we shall solve a problem in sampling 
theory when we are sampling from this two-variable distribution. 

Let (X u K t ), (X 2 ^ F 2 )，• •. ， (X n ^ Y n ) denote a random sample of 
size n from a bivariate normal distribution with p.dX J{x, y) and 
parameters /i I ， /i 2 , and p. We wish to find the joint p.d,f+ of the 

two statistics X = ^ X { \n and F — ^ YJn. We call X the mean of 

I 

,.. -, X n and Y the mean of K s ,. ■ ” Since the joint p.d.f of 
the 2n random variables (X h YX i = U 2 ,,, • ， / 1 , is given by 

h ^A^\,y\)Ax 2 ,y 2 ) - ^ ^J{x n , y n \ 
the itLg.f. of the two means X and Y is given by 


H 


n 


Af(/j , t 2 ) — 


!>/ ^ Z yi 


exp 


n 


+ 


n 


\h 办 … dy n 




n 


exp f ^ ^ ]f(x h y { ) dx t dy ( 
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The justification of die form of the right-hand member of the second equal¬ 
ity is that each pair (X ti Yi) has the same p,d,f, and that these n pairs are 
independent. The twofold integral in the brackets in the last equality is the 
joint of X\ and Y\ (see Section 3,5) with ti replaced by tjn and h 
replaced by t 2 /n. Accordingly, 


n 

h) = II ex P 



+ hfh 
n n 


+ ^(f|/n) 2 + n){t 2 /n) + d\ (t 2 /n ) 2 

2 ' 


^ exp rj/x, + t 2 ^ + 


(erf/ n)t] + lp{(r x a 2 /n)t x t 2 ^ {a\/ n)t\ ； 


2 




But this is the m.g.f. of a bivariate normal distribution with means 
", and /i 2 ^variances a^/n and a\jn ，and correlation coefficient p; 
therefore, X and Y have this joint distribution. 


EXERCISES 

4.75* Let the i 丄 d random variables X { and X 2 have the same p*d.f*y(x) = 

文 =1 5 2, 3, 4, 5, 6, zero elsewhere. Find the p.d.f. of Y — X t + X 2 . Note* 
under appropriate assumptions，that Y may be interpreted as the sum of 
the spots that appear when two dice are cast. 

4,76* Let X\ and X 2 be independent with normal distributions iV(6, 1) and 
N(7, 1), respectively. Find Pr {X x > X 2 Y 

Hint: Write Pr (X x > X 2 ) = Pr (Jf, — A" 2 > 0) and determine the 
distribution of X v — X 2 * 

4*77. Let and X 2 be independent random variables. Let and 

Y — X x + X 2 have chi-square distributions with r r and r degrees of freedom, 
respectively. Here < r. Show that X 2 has a chi-square distribution with 
r 一 r】degrees of freedom. 

Hint: Write M(t) ― £■(〆 々 + 々 ’）and make use of the independence of 

Btld ^2 - 

4*78, Let Che independent random variables and X 2 have binomial 
distributions with parameters n u P\ and n 2f p 2 — ^ respectively. Show 
that F = — 十 rt 2 has a binomial distribution with parameters 

« = * 

4,7 久 Let ， I 2 , A be a random sample of size n = 3 from ^V(1,4). Compute 
P(X x + 1X 2 ^ 2X y > 11 
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4.80. Let JT, and X 2 be two independent random variables. Let Jf, and 

Y — + X 2 have Poisson distributions with means and fi > fi u 

respectively- Find the distribution of X 2 - 

4.81. Let X u X 2 be two independent gamma random variables with 
parameters a t = 3, 灼 = 3 and a 2 = 5, 冷 2 = respectively. 

(a) Find the m.g.f. of Y^2X x + 6X 2 . 

(b) What is the distribution of 7? 

4 # 82. A certain job is completed in three steps in series. The means and 
standard deviations for the steps are (in minutes): ， 


Step 

Mean 

Standard Deviation 

1 

17 

■r ™ ! f' 

2 

2 

13 

1 

3 

13 

2 


Assuming independent steps and normal distributions, compute the 
probability that the job will take less than 40 minutes to complete. 

4.83. Let X be AT(0, 1), Use the moment-generating-function technique to 

show that Y = X 2 is x 2 (l)* 1 

Hint: Evaluate the integral that represents E{e txl ) by writing 
w = x^J\ - 2t y t<^ - 

4.84. Let X u X 2 ,., X n denote n mutually independent random variables 

with the moment.generating functions M x {t\ M 2 {t\ ， M w (/), respect¬ 

ively, 

(a) Show that Y — k l X l + k 2 X 2 + “ • + where H, " ， are real 

n 

constants, has the m.g.f. M(t) = 

i 

(b) If each k § — 1 and if X { is Poisson with mean i — 1 ， 2, “ ” w，prove 

that Y is Poisson with mean + … + 

4.85. If X u X 2i ... 9 X n h a random sample from a distribution with 

； * t tt n 

M(t) f show that the moment-generating functions of ^ and ^ XJn are ， 

i \ 

respectively，and [M{tjn)f\ 

4.86* In Exercise 4.74 concerning PERT, assume that each of the three 
independent variables has the p.dX f(x) - e~ x f 0 < x < oo, zero elsewhere. 
Find: 

(a) The pAS. of K 

(b) The p*d.f. of Z. 

4.87* If X and Y have a bivariate normal distribution with parameters 

内，芦 2 , and p, show that Z — aX + bY + c is 

■ 

N(af^ + bfi 2 + c ， + 2abpa x a 2 + 
where a r b f and c are constants. 

Hint: Use the m.g.f. M{t u t 2 ) of and Y to find the m.g.f. of Z. 
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4.88, Let X and Y have a bivariate norma! distribution with parameters 

叫 = 25 ， 此 = 35 ， 16, and If Z=3Z^2r, find 

Pr(-2<Z<19). 

4.89 # Let U and K be independent random variables, each having a standard 

normal distribution. Show that the m.gX E{e^ UV) ) of the product UV is 
(1 一 十 1/2, — 1 ’ 

Hint: Compare E{e lUV ) with the integral of a bivariate normal p ， d,f. that 
has means equal to zero. 

4,90* Let X and Y have a bivariate norma! distribution with the parameters 
fi u pL ly erf, al ，and p m Show that 

W 7 = f — 的 and Z — (Y — fi 2 ) — p(a 2 /a\)(X 一 川） 
are independent normal variables. 


4.9L Let X lt X 2y be a random sample of size n — 3 from the standard 
normal distribution. 

(a) Sjioy that Y { = Jf ( + Y 2 = X 2 + SX 3 has a bivariate normal 
distribution. 

(b) Find the value of 3 so that the correlation coefficient p — 

(c) What additional transformation involving Y\ and Y 2 would produce a 
bivariate normal distribution with means /i| and ^ variances and 
oj，and the same correlation coefficient pi 

4,92* Let X u X 2i . *., X n be a random sample of size n from the normal 
distribution N(jx, a 2 ). Find the joint distribution of F = [ a t X f and 

fc r’ v I 

n I 

Z = f>iX h where the a t and b f are real constants. When, and only when, 

■ ■ 

are Y and Z independent? 


* _ 

Hint: Note that the joint m.g.L E 
of a bivariate normal distribution. 

I ! r - u 


n 


exp (/ f X f 2 £ b.X, 

i \ 


is that 


4.93* Let X u X 2 be a random sample of size 2 from a distribution with positive 
variance and m.g.f- M{t). lfY—X l + X 2 and Z — j — X 2 are independent, 

prove that the distribution from which the sample is taken is a normal 
distribution, 

■ ■- ■■- 了 I ■ 1 « 

Show that 

h) = 五 {exp [t x {X x + X 2 ) + r 2 (X t - X 2 )]} - M{t x + h)M{t y - t 2 ). 

Express each member of m(f h / 2 ) = m(i u 0)m(0, t 2 ) in terms of M; differ- 

entiate twice with respect to / 2 ; set t 7 = 0; and solve the resulting differential 
equation in M. 


4.8 The Distributions of X and nS 2 /a 2 

Let X u L … 、 X n denote a random sample^f size n >2 from a 
distribution that is N(ji, <r 2 ). In this section we shalTurveiHgate the 
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distributions of the mean and the variance of this random sample, 
_that_is^ the distributions of the two statistics ^ [ X“n and 

S 2 - £ (X t - X) 2 /n. 1 

[ _ * 

The problem of the distribution of the mean of the sample, is 

solved by the use of Theorem 1 of Section 4.7. We have here, in the 

访今 1 IP 赵 ti 切 Dl of the statement of that theorem, 二 fh = … 

— Oj ^ ^ ^ and 夂 =k 2 = … 之 \jn. Accordingly, 

Y = X has a normal distribution with mean and variance given by 



respectively. That is, X is NQi, <r 2 /n). 


Example L Let X be the mean of a random sample of size 25 from a 
distribution that is iV(75, 100)* Thus X is 7V(75, 4). Then, for instance, 


Pr(7! < X<79)-4>| 


79-75 



=$(2) - 0(-2) = 0 954. 

We now take up the problem of the distribution of S 1 、the variance 
of a random sample X u *, * ^ from a distribution that is N(ji 7 a 2 )- 
To do this, let us first consider the joint distribution of Y { — X, 

y 2 ^x 2 ~ x 7 r 3 = ^ ... ， Y n = x n - 


inverse transformation ^ 

— 一" - 一一 ■ ■■ ~ 1,4-1 






X} = y\~y 2 - y 3 


yn 


x 2 — y 2 

= y t + y 3 


* 

* 

# 



X n = yi Yn 


has Jacobian n. Since 

f [ jid ^ i J 


n 


z ( x ( —/ o 2 = i ; (a 一又+无一/0 2 




n 


(x, - x) 2 + n(x - fi) 2 
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n 


because 2{x — //) (x f 一 jf) = 0， the joint p.dX of X u X 2 , ^ ^ X n 


can be written 



exp 


I!(i/ 一又 ) 

2a 2 


2 


n(x - /i) 2 
2a 1 


where x represents (x { + x 2 A - f- x n )jn and — oo < x t < oo, i 

1 ， 2, ... ， w ， Accordingly, with jh = x and x, — Jc = —y 2 — 乃 一 • 
—we find that the joint fxdX of Y u F 2 , … ， r n is 





exp 


n 


~yi — * - * — yn) 

2 ? 


2 


? 乃 n{y x - fi) 2 


2cr 2 


2a 1 


— co<y i <co, i = 1,2,,.., n. Note that this is the product of the 
p.dX of Y x ^ namely. 


^/inc^fh 


exp 


(Ji - y) 

2cr 2 /« 


r 


— 00 < 00, 


and a function of j 2 ，， …， jv Thus Y x must be independent of 
the n — I random variables V 2 , Y 3 , * …， L and that function of 
j 2 , …， is the joint p,d.f- of Y 2 , F 3 , … ， Y n . Moreover，this means 
that F, = X and thus 

niYy-tif n(X^fi ) 2 . 


a 


2 


w x 


are independent of 


n 


n 


-n-- —n ) 3 + ly? Z (足 — 幻 : 


<r 


a 


W 2 . 


Since W\ is the square of a standard normal variable, it is distributed 
as jf 2 ⑴ - Also, we know that 

( 不 : IV 2 

is x\ n ) - From the independence of W x and W 2 , we have 
• E{e iW ) - 五 (，〗)£(， 2 ) 

or, equivalently, 

(1 — 2/)- w/2 = (1 - 2tr lil E{e tW ^X t<\. 
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Thus 

E ( e r ^) = (1 一 2/) -卜 " /2 ， t <\, 

and hence W 2 — nS 2 ja 2 is x 2 (P — 0- The determination of the p.d.f. of 
S 2 is an easy exercise from this result (see Exercise 4.99). 

To summarize, we have established, in this section, three important 
properties of X and S 2 when the sample arises from a distribution which 
is a 2 ): 

i • X is N(fi y cr^in). 

2. nS 2 /^ is x 2 ( n ^ 1)- 

3* X and iS 2 are independent. 

For illustration, as the result of properties I) ， (2)，and (3)，we have 
that ^/n(X~ fi)/(T is N(0, !)• Thus, from the definition of Student's U 

t (X - ixjlicly/n) x-ti ‘ 

jf ■ 置 - 

jnSKn - Sj^/n^- 1 ‘ 

has a ^-distribution with n — 1 degrees of freedom. It was a 
random variable like this one that motivated Gosset's search for 
the distribution of T. This 卜 statistic will play an important role in 
statistical applications, 

EXERCISES 

4*94* Let X be the mean of a random sample of size 5 from a normal 
distribution with fx — O and a 2 = 125, Determine c so that Pr (X < c)= 
0.90- . ^ 

4-95, If X is the mean of a random sample of size n from a normal distri¬ 
bution with mean pt and variance 100， find n so that Pr(/i —5< 
X<u+S) = 0.954. 

»• 

4 •穿 6* Let 不， A" 2 ,. ■. ， X 2 5 and Y u F 2l ,, ,» Y 25 be two independent random 
samples from two normal distributions N(0, 16) and N(i ， 9) ， respectively* 
Let X and Y denote the corresponding sample means. Compute Pr (X > Y). 

4*97* Find the mean and variance of 5 2 = ^ (X t - — X) 2 fn, where H …， 

i • 

X„ is a random sample from N(fx^ a 2 ). 

Hint: Find the mean and variance of nS 2 ja 2 . 
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4.98* Let S 2 be the variance of a random sample of size 6 from the normal 
distribution N(fi, 12). Find Pr (2,30 <S l < 22.2), 

4*99, Find the p.d.f, of the sample variance V — S 2 y provided that the 
distribution from which the sample arises is N(fi^ a 2 ). 

Jl - « 

4.100* Let X and Y be the respective means of two independent random 

samples, each of size 4, from the two respective normal distributions 

^(10, 9) and N(3, 4). Compute Pr (X > 2 Y). 

_ * _ 

4*101, Let A^，JT 2 , •. • 9 be a random sample of size n — 5 from 所0, cr 2 ) ■⑻ 
Find the constant c so that c(Xi — X 2 )/y/xl + Xl + Xj has a /-distribution, 
(b) How many degrees of freedom are associated with this 7? 


4.102* If a random sample of size 2 is taken from a normal distribution with 


mean 7 and variance 8, find the probability that the absolute 
difference of these two observations exceeds 2. 


value of the 


4J03, Let X and S 2 be the mean and the variance of a random sample 
of size 25 from a distribution that is W(3，100). Then evaluate Pr (0 < < 6 

S52 <S 2 < 145*6). • ’ 


4*9 Expectations of Functions of Random Variables 

Let 尤， X 1% … 、 X n denote random variables that have the joint 
P.dX J{x u i 2 , • • • ， a). Let the random variable Y be defined by 
Y- u{X u X 2 ^ t . ,X n ), In Section 4,7, we found that we could 
compute expectations of functions of Y without first finding the p.d.f. 
of Indeed，this fact was the basis of the motnent-gene rating-function 
procedure for finding the p.d.f. of K We can take advantage of this 

in a number of other instances* Some illustrative examples will be 
given* - v 

Example L Say that W is W(0, 1)，that V is x 2 (r) with r > % and that W 

and V are independent. The mean of the random variable T = W^/rjv exists 
and is zero because the graph of the p，dX of r(see Section 4 4) is symmetric 
about the vertical axis through / = 0+ The variance of 7； when it exists, 
could be computed by integrating the product of t 2 and the p.d.f, of T. 
But it seems much simpler to compute 


4 = E(T 2 ) = W 2 pj = E{W 2 )e(^\ . 


Now W 1 is 之 2 (1)， so E{W 2 ) = L Furthermore, 


E 



f*00 ■ 

n 广 ' e ‘ dv 
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exists if r > 2 and is given by 

推 - 2)/2] _ rf[(r _ 2)/2] , 一 r 

2F(r/2) 一 2[(r^ 2) 厚 [ 卜 2)/2] 一 r — V 

* *■ . , ■ 

m I 

Thus = r/(r — 2), r > 2. 

,* 暑 • 

Example 2. Let X, denote a random variable with mean 片 and variance 
f = 1, 2, ••,，/!■ Let X 2 , …，尤 be independent and let H … ， k n 
denote real constants. We shall compute the mean and variance of a linear 

function Y — k t X } + k 2 X 2 -I - + Because E is a linear operator, the 

mean of Y is given by 

♦ 4 ^ 3 

= + k 2 X 2 + " • + k n X„) 

"*■ • IP w ■* 

= k.EiX,) + k 2 E(X 2 ) + — + 

* i * * 

if 

= k t fi } + k 2 fi 2 + •. _ + k R tx n - Y, 

I , 

The variance of Y is given by 

4 = + … + MQ — (Mi + … + Kphd) 2 } 

= 五 {[*■(% - W + … + K{X n - ^)] 2 } 

-4^ 衫(不一从 ) 3 + 2 II - 

I i<j J 

* # I 释 

- t - + 2 s S ㈣ 聊 - 嫩 i)]‘ 

，’， i / </ ^ 

* 由 

Consider E[{Xi — — ^)], i < / Because X t and are independent^ we 

have 

■ # - ■ _ 

£[(JG — 片 )(1} 一 巧 )] = E{X^ ixmX, - Hj) - 0. 

Finally, then, 

4= Z ^£1(^/ - ^) 2 ! = Z > 

/ ( i — I 

We can obtain a more general result if, in Example 2, we remove 
the hypothesis of independence o?X u X 2 , *.., X n . We shall do this and 
we shall let p #y denote the correlation coefficient of X ( and Xj. Thus for 

easy reference to Example 2, we write 

p t r ^ - * * .. 

E^Xi — — /iy)] = Pi ；。 没 j ， 
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If we refer to Example 2, we see that again = But now 

I 

.琴 .#■ 

n 

= Z + 2 SZ ⑼曲简 • 

l i<j 

Thus we have the following theorem, 

_ — *■ 

Theorem 5* Let AV，-. ■ ，足 denote random variables that have 

-- • 

^eans 内， •…， /^ and variances Let p u , i #/, denote the 

correlation coefficient of X t - and Xj and let k' ， … ， k n denote real 
constants. The mean and the variance of the linear function 

F S 

i 

are ， respectively ， 

n 

~ X 灸 I ■从 

\ 

and 

•JV 

= Z + 2 ZE 

1 i < / 

The following corollary of this theorem is quite useful 

“ - r 

Corollary. Let X〗， .• .\ X n denote the observations of a random 
sample of size n from a distribution that has mean p and variance a 2 . The 

mean and the variance ofY = ^ k.Xi are ， respectively, fi Y — f ^ W and 



Example J t Let X —J] XJn denote the mean of a random sample of size 

_ •‘ I i 

from a distribution that has mean a and variance a 2 . In accordance with 
the corollary, we have 抑 (l//i) - \i and g\ = (l/w) 2 = a 2 /^ We 

have seen，in Section 4,8, that if our sample is from a distribution that is 
jV(// ，a 2 )，then X is N(ji ， a 2 jny It is interesting that = fi and a\ — a 2 /n 
whether the sample is or is not from a normal distribution. 

.钃 

EXERCISES 

4.104, Let X u X 2 , X 4 be four i 丄 d，random variables having the same p.d 工 

/(x) = 2 jc ，0 < x < 1， zero elsewhere. Find the mean and variance of the 
sum Y of these four random variables* 
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4*105, Let X] and X 2 be two independent random variables so that the 
variances of X t and X 2 are af = k and tr| = 2, respectively. Given that the 
variance of K = ZX 2 — is 25, find k. 

4*106, If the independent variables X t and X 2 have means fi 2 and variances 

a 2 ^ trf, respectively, show that the mean and variance of the product 
Y = X x X 2 are 妁只 2 and + + respectively. 

4.107. Find the mean and variance of the sum Y of the observations of 
a random sample of size 5 from the distribution having p,dX f(x) — 
6x(1 — jc), 0 < ^ < 1, zero elsewhere, 

4.108. Determine the mean and variance of the mean X of a random sample 
of size 9 from a distribution having p-d-f. J{x) — 4X 3 ，0 < x < zero 
elsewhere. 

4.109. LetJTand Kbe random variables with fi t = I, ju 2 — 4, a\ = 4, <j\ = 6, 
p — Find the mean and variance of Z = 3JT— 2K 

4*110. Let X and Yhc independent random variables with means pL u fi 2 and 
variances ^ Determine the correlation coefficient ofX and Z — X — Y 
in terms of (i Vj fi 2 , of, 

4*111* Let fi and a 2 denote the mean and variance of the random variable X. 
Let Y= c + bX f where b and c are rea! constants. Show that the mean and 
the variance of Y are, respectively, c + bfi and l^a 2 , 

4*112. Find the mean and the variance of Y — 2X 2 + where 

X U X 2 , X 3 are observations of a random sample from a chi-square 
distribution with 6 degrees of freedom. 

• -V * ■ 

4_113* Let Xand Yhc random variables such that var (X) = 4, var (Y) — 2, 
and var (X+2Y)- 15. Determine the correlation coefficient of X and K 

4.114* Let X and Y be random variables with means /£ h variances <P 2 \ 
and correlation coefficient p. Show that the correlation coefficient of 
W == aX + b y a > 0^ and Z = cY + d, c > 0, is p, 

4.115- A person rolls a die, tosses a coin，and draws a card from an ordinary 
deck. He receives $3 for each point up on the die ， $10 for a head ， $0 for 
a tail, and SI for each spot on the card (jack — 11, queen — 12, king — 13 ). 
If we assume that the three random variables involved are independent and 
uniformly distributed, compute the mean and variance of the amount to be 
received. 

4.116. Let U and V be two independent chi-square variables with q 
and r 2 degrees of freedom ， respectively- Find the mean and variance of 
F — {r 2 U)l{r x V). What restriction is needed on the parameters r* and r 2 in 
order to ensure the existence of both the mean and the variance of F? 
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4 ， 117. Let 不， X 2 , …，尤 be a random sample of size n from a distribution 
with mean /i and variance a\ Show that E(S 2 ) = (« - \)a 2 jn, where S 2 is 
the variance of the random sample. 

Hint: Write S 2 = (l/n) "£ (X f - {if - (X - fi)\ 

4.118. Let X x and X 2 be independent random variables with nonzero 
variances. Find the correlation coefficient of X x X 2 and 不 in terms of 
the means and variances of X } and X 2 * . 

4*119* Let and X 2 have a joint distribution with parameters /i h (rf, <r 2 2 , 
and p. Find the correlation coefficient of the linear functions 
Y — a } Xi + a 2 X 2 and Z — h x X { + b 2 X 2 in terms of the real constants a u a 2 , 
fh ，and the parameters of the distribution. 

4,120. Let X u X 2 ,.^ 9 X„bea random sample of size n from a distribution 
which has mean fi and variance a 1 . Use Chebyshev’s inequality to show, for 

every e > 0, that lim Pr (\X ~ pi\ < e) = ] ； this is another form of the law 
of large numbers;* 

4.121* hctX[ 3 X 2 ^ and be random variables with equal variances but with 
correlation coefficients p l2 = 0.3, = 0.5，and p 23 = 0,2. Find the 

correlation coefficient of the linear functions Y^X { + X 2 and 

^ z^x 2 + x 3 : ^ * 

4.122, Find the variance of the sum of 10 random variables if each has 
variance 5 and if each pair has correlation coefficient 0.5, 

A* 1 !■ ■ 

-P 

* if 

4*123, Let X and Y have the parameters // 2 , and p. Show that the 
correlation coefficient of X and [Y ~ piojIa^X] is zero. 

； i f ■ * .• 

— • ^ •- ■ 

4-124_ Let $ and X 2 have a bivariate normal distribution with parameters 内 ， 

此，戊 ? ，and p. Compute the means, the variances, and the correlation 
coefficient of r, = exp (JT,) and Y 2 = exp (Jr 2 ), ■ 

Hint: Various moments of Y { and V 2 can be found by assigning 
appropriate values to /, and t 2 in £[exp (/^j + t 2 X 7 )l 

4*125, Let X be NQi f a 2 ) and consider the transformation X =\n Y or, 
equivalently, Y = e x , 

(a) Find the mean and the variance of Y by first determining E(e x ) and 

翊 々]* : ■ 

Hint: Use the m.g.f of X. 

(b) Find the p.d,T of Y, This is the p.d,H of the lognormal distribution, 

4.126* Let Xi and X 2 have a trinomial distribution with parameters w, p 、， p 2 . 

(a) What is the distribution of V X t + X 2 1 

(b) From the equality ^ — cf] + al + 2pG x a 2 ^ once again determine the 
correlation coefficient p of and X 2 . 
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4*127* Let -|- Jf *2 and Y 2 — Xj Aj, where 尤 2 ， and three 

independent random variables. Find the joint m,gX and the correlation 
coefficient of Y x and Y 2 provided that: 

(a) 不 has a Poisson distribution with mean fi h i = 1, 2, 3. 

(b) JSQis N(fi h <xjXi=U2, 3. 


4128. Let X|， …， ^ be random variables that have means 内 ，…，/ and 
variances af/ " ，， Let p ijt i # j t denote the correlation coefficient of Xi 
and Xj, Let a u ., ^a n and ， … ， b n be real constants. Show that the 

n fi rt 

covariance of Y — and Z = ^ bjXj is ^ ^ where 

i — I /—I /—I/—* 

* 鴻警 

Pii = 1， / = 1 ， 2” - * ， 7I- … ：■ • 


*4,10 The Multivariate Normal Distribution 


We have studied in some detail normal distributions of one 
random variable. In this section we investigate a joint distribution 
of n random variables that will be called a multivariate normal 
distribution. This investigation assumes that the student is familiar 
with elementary matrix algebra, with real symmetric quadratic forms, 
and with orthogonal transformations. Henceforth, the expression 
quadratic form means a quadratic form in a prescribed number of 
variables whose matrix is real and symmetric. All symbols that 
represent matrices will be set in boldface type. 

Let A denote an n x n real symmetric matrix which is positive 
definite- Let n denote the n x 1 matrix such that 〆， the transpose of 
科 ， is 〆 =[/Xi ， /x 2 , ， . ，， juJ，where each ^ is a real constant. Finally, let 
x denote the n x l matrix such that x' = [x 】， x 2 , ♦*. ， x n ]. We shall 
show that if C is an appropriately chosen positive constant, the 
noimegative function 




， x n ) — C exp 


(x - jiKACx - jif 
2 



00 < < 00 , / 1 ， 2 ， •*■，"， 

is a joint p*d,f of n random variables X U X 2 ^ ，…， that are of the 

continuous type. Thus we need to show that 

.• " \ ■ • * ■ ■ 

广 00 . v 

… /Oi ， x 2 ,. • • ， X h ) dx 、 dx 2 •… dx„ = l. (1) 
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Let t denote the n x l matrix such that t’ = [f 2 , ". ， 4】， where 
h , …， G are arbitrary real numbers. We shall evaluate the integral 


C 


产 00 


oo 


exp 


t，x 


(x — vlYA(x - ||) 
2 


dx' … dx nJ (2) 


and then we shall subsequently set f, = = … = 4 = 0, and thus 

establish Equation ( 1 ). First，we change the variables of integration in 
integral (2) from ， x 2 ,. " ， 4 to 乃，乃， • "，凡 by writing x —jt = y ， 
where 〆 =[y 〗， 少 2 ,… ，凡 ] • The Jacobian of the transformation is one 
and the ^-dimensional x-space is mapped onto an ^-dimensional 
j-space, so that integral (2) may be written as 


C exp 


/*<x> 


exp t y 


y^y 

2 


咖… dy n . 


( 3 ) 


00 


Because the real symmetric matrix A is positive definite, the n 
characteristic numbers (proper values, latent roots，or eigenvalues) 
a】,• • ，仏 of A are positive. There exists an appropriately chosen 
n X n real orthogonal matrix L ( 1 / = L' where L~ r is the inverse 
of L) such that 


LAL 




0 

a 2 


0 

0 


0 0 




for a suitable ordering of a u a 2 ,. * *, a n . We shall sometimes write 
L AL = diag [a, , a 2 , ，…， a n ]. In integral ( 3 ), we shall change the 

variables of integration from 乃，乃， … ，乂 * to z 2 , _, z n by writing 

y — Lz, where 1! ^ ••” A]. The Jacobian of the transformation 

is the determinant of the orthogonal matrix L, Since LX = I„, where 
l„ is the unit matrix of order w, we have the determinant |LX[ = ! and 
|L| 2 = L Thus the absolute value of the Jacobian is one. Moreover, the 
/i-dimensional 少 -space is mapped onto an /?«dimensional z-space. The 
integral ( 3 ) becomes 


C exp (fji) 


/*o0 


/ *00 


exp 


tLz 


z\VAl,)z 


dz 广 • dz. 


( 4 ) 


It is computationally convenient to write, momentarily, t’L = w 
where w' = [w^ w 2 . w n ]. Then 

exp [tXz] = exp [w'zj — exp 
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Moreover- 


" /(I/AlV 


i ^ 

i 

2 

一 exp 

一 2 

m- 


exp 


Then integral (4) may be written as the product of n integrals in the 
following manner: 


C exp (wX'it) fj 


exp 


U. 


a〆; 

~Y 


dzi 


n 


C exp (wX» H 





exp I WjZi 




— OD 


s/^njai 


dii 


( 5 ) 


The integral that involves z ； can be treated as the m.gX, with the more 
familiar symbol t replaced by w h of a distribution which is N(0 y 
Thus the right-hand member of Equation (5) is equal to 


Cexp (wX'ji) fl 


’戶 

V 


exp 



- Cexp(w'l/pi) 
iause L _1 = L/，we have 
(I/AL)— 1 = L A *L = diag 




( 6 ) 


「 




a 疗 


Thus 


n 


!L)w = (Lw)'A- ， (Lw) = t A ! t 


Moreover, the determinant (A" 1 ) of A _i is 


K 1 卜 


a { a 2 … a 


n 


Accordingly, the right-hand member of Equation (6)，which is equal 
to integral (2)，may be written as 


Ce r ^(2ny\A- ] \ exp 


tA 


2 


⑺ 


rK --.:;:_.f 
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* ^ 


If，in this function, we set ^ — t 2 = ^ i n = 0, we have the value of 
the left-hand member of Equation (1). Thus we have 


C^/(27ty\A^\ 二 
Accordingly, the function 


， a ，•. • ，^) 


(27r)^ v ^A rT i 


exp 


(x — |i)’A(x — |i) 


— oo < Xi < go, / = I ， 2 ,"” 行 ， is a joint p.d.f. of n random variables 
尤 I ，又 2 ,…，尤 that are of the continuous type. Such a p.dX is called 
a nonsingular multivariate normal p.d.f. 

We have now proved that f(x u x 2 ，…， x„) is a p.df. However ， 
we have proved more than that. Because f{x u …， is a p‘df” 
integral (2) is the m.gX M{t u t 2l -.., 4 ) of this joint distribution of 
probability. Since integral (2) is equal to function ( 7 ), the m,gi of the 
multivariate normal distribution is given by 

M(t { , / 2 ,… ， 4 ) = exp 



Let the elements of the real, symmetric, and positive definite matrix 
A -1 be denoted by <r /7 , ij = 1 ， 2” " ， 九 Then 

《Tv - W 



M(0 7 ，CM 0 , • ■ + , 0 ) = exp {/ 说 + 


~2 


is the m.gX of X h i = 1 ， 2, …， /i. Thus is N(/i h / = 1 ， 2,…， /!• 

Moreover, with i we see that M(0, • •,, 0 , 心 , 0. t I% 0. 0 )， 

the m,g,f. of Xi and A}，is equal to 


exp + tjiij + 


+ laijtftj + (Tjjtf 


2 


which is the m.g.f, of a bivariate normal distribution. In Exercise 4」 31 
the reader is asked to show that a u is the covariance of the random 
variables X t and Xj. Thus the matrix 料 ， where ji' — [pi u /i 2 , *.., /i w ], 
is the matrix of the means of the random variables , X n . 

Moreover, the elements on the principal diagonal of A ^ 1 are, 
respectively, the variances a fl — — 1 ， 2 , • ■ ■ ， 《, and the elements 

not on the principal diagonal of A ' 1 are, respectively，the covariances 
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— Pij 明 ， i of the random variables X U X 2 
matrix A 1 ，which is given by 


, X n . We call the 



(J|2 * … 

a tn 

汀 12 

_ 

0*22 •… 

_ 

m 

_ 

江 l/i 

« 

■ 

w 


the covariance matrix of the multivariate normal distribution and 
henceforth we shall denote this matrix by the symbol V. In terms 
of the positive definite covariance matrix V， the multivariate normal 
p_dX is written ， 




exp 


： ~2 


K _ 


— 00 < Xj < 00 , 


1, 2,… ， n，and the nug.f. of this distribution is given by 


exp 


m 

~Y 


for all real values of t* 

Note that this m.gX equals the product of n functions, where 
the first is a function of alone, the second is a function of t 2 alone, 
and so on, if and only if V is a diagonal matrix. This condition, 

= 0, means p fJ = Q，i That is，the multivariate normal 
random variables are independent if and only if p u ― 0 for ail i # / 


Example L Let X u X 2$ ... have a multivariate norma] distribution 
with matrix |t of means and positive definite covariance matrix V. If we let 
X' = [X u A" 2 , • - -, X n ] f then the M(t u / 2 ，…， O of this joint distri¬ 

bution of probability is 


£(〆 x ) = exp [ fji + 



( 8 ) 


Consider a linear function Y of , X n which is defined by 7== 

n ■ * 二 ' * l ■ 

c'X — ^ c f X h where c' = [c,, c 2 , ■…， cj and the several c,. are real and not 

i . t ■ 早 〜 ( r . 

£)11 u/icli to finH tfi#* n. /I F r\€ V rri rr f w %(A r\f fliA rlietriKnitirtn 

■t"— — ■ w— *** -*■ t * n ■■■■■■ ™ ™ ■ 義冒 ■皋 *w w ■«. « m * .ml m m'w mmm m m m. m w t r v 产 -m mm m •mm w ， 滿 au* ■mm a-aii. 

of Y is given by 


m(r) = E(e tY ) - E{e^% 
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Now the expectation (8) exists for all real values of t Thus we can replace f 
in expectatioti (8) by tc r and obtain 


m{t) — exp ( /c’|i + 


cTc〆 


Thus the random variable Y is c 7 Vc), 


EXERCISES 


4J29* Let X u X 2 , - < -, have a multivariate normal distribution with 
positive definite covariance matrix V. Prove that these random variables are 
mutually independent if and only if V is a diagonal matrix. 

4J30* Let n — 2 and take 


y _ a \ P^\ a 2 

Determine [V^V- 1 , and (x — ji/V— ! (x — ft). Compare the bivariate normal 
p.dX of Section 3.5 with this multivariate normal p.d.f. when n = 2. 

4J31. Let m(t h tj) represent the m.g.f of and Xj as given in the text. 
Show that 


d 2 m(0, 0) 

~dm(0 t 0)" 

dm(0, 0) 

dtfdtj 

dt ( 

dtj 


= 


that is，prove that the covariance of X f and X } is u i/y which appears in 
that formula for m(i h / y ). 

4.132* Let X { , X 2i ... 7 X n have a multivariate normal distribution, where 
is the matrix of the means and V is the positive definite covariance matrix. 
Let Y — c'X and Z — d'X，where X" — [X { 5 .., t t f — [c u c n ]，and 
d^ = [d H ,.., are real matrices* 

(a) Find m(t u t 2 ) = E{e n r ^ ijZ ) to see that Y and Z have a bivariate normal 
distribution* 

(b) Prove that Y and Z are independent if and only if c'Vd == 0, 

(c) If X u X 2 , …， X n are independent random variables which have the 
same variance a 1 ^ show that the necessary and sufficient condition of 

part (b) becomes c'd = 0* 

■ .■ M ■___ . 

4.133. Let X' — [X^ , X n \ have the multivariate normal distribution of 

Exercise 4.132, Consider the p linear functions of X u . t , X R defined by 
W = BX，where W' = [W u • " , JV P ],p <n, and B is a/? x n real matrix of 
rank/?. Find m{v u … ， v p ) = E(e vyv ), where y f is a real matrix [(?“••”％】， 
to see that ^ W p have a /^variate normal distribution which has B|t 

for the matrix of the means and BVB’ for the covariance matrix. 
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4*134 ， Let X' = [X u X 2l . ^ , X n ] have the /i-variate normal distribution 
of Exercise 4.132. Show that Jf_, %， • " ，;， p <n y have a variate 
normal distribution. What submatrix of V is the covariance matrix of 

，…， A? 

Hint: In the m.gJ. M(t u t 2 , …， tj of JT, ， JT 2 ,. … ， A ；， letf p+l ^ = 

4 = 0, 

ADDITIONAL EXERCISES 

4.135. imiasthep.d-f,y(x) = !，一 1 < x < 2, zero elsewhere, find the p,dX 
of 7= 舻 _ 

4.136. The continuous random variable X has a p.d 丄 given by f(x) = 1, 
0 < x < 1, zero elsewhere. The random variable Y is such that 
Y ^ —2 In X, What is the distribution of F? What are the mean and the 
variance of 7? 

4.137. Let JT 2 be a random sample of size n = 2 from a Poisson distri- 

bution with mean fi. If Pr (X l + = 3) = (f )e— 4 , compute Pr (X l = 2, 

X 2 = 4), 

4*138* Let X U X 2 ^ ." ， 1 25 be a random sample of size /i = 25 from a 
distribution with pAJ.J{x) = 3/x 4 , I < x < oo, zero elsewhere. Let Y equal 
the number of these X values less than or equal to 2. What is the distribution 

of r? 

4.139* Find the probability that the range of a random sample of size 3 from 
the uniform distribution over the interval ( — 5, 5) is less than 7 . 

4.140* Let Y l < F 2 < F 3 be the order statistics of a sample of size 3 from a 
distribution having pAS,J{x) = }， 一 I < x <2 y zero elsewhere. Determine 

4.141. Let X and K be random variables so that Z — X-2Y has variance 
equal to 28* If = 4 and p XY = \ , find the variance of Y. 

4.142, Let Y t < Y 2 < Y^< Y 4 be the order statistics of a random sample 

of size n — 4 from a distribution with p,dX f{x) = 2(1 — x)，0 < < 1 ， 

zero elsewhere. Compute Pr (Y l < 0/1)* 

4*143* A certain job is completed in three steps in series. The means and 
standard deviations for the steps are (in hours): 


Step 

Mean 

Standard Deviation 

! 

3 

0.2 

2 

1 

0,1 

3 

4 

0.2 

V ■ 
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Assuming normal distributions and independent steps, compute the prob¬ 
ability that the job will take less than 7*6 hours to complete. 

■ 蓍 

4144. Let U 2 ，…， be a random sample of size n from a distribution 
having mean fx and variance 25. Use Cfaebyshev's inequality to determine 
the smallest value of n so that 0.75 is a lower bound for Pr [\X — fi\ < 1], 

4*145. Let X x and X 2 be independent random variables with joint p.d.f 

jy 、 x；(4 — X2) 

= — 3^ s ^ = !, 2 S 3, 1,2,3 ， 

and zero elsewhere. Find the p,d.f. of Y= — X 2 . 

4146. An unbiased die is cast eight independent times. Let Kbe the smallest 
: of the eight numbers obtained. Find the p.dX of K 

4.147. Let X u X 2y X 3 be iAA. #(/i ， o 2 ) and define 

Y\ == Jfj + 
and 

* 1 T 

(a) Find the means and variances of K, and V 2 and their correlation 
coefficient* 

(l>) Find the joint m.g.f. of Y } and Y 2 , 

4.148 ‘ The following were obtained from two sets of data; 

=20 ， x = 25, 4 = 5 ， 

r ■ 

n i = 30, J = 20 ， ^ = 4. 

Find the niean and variance of the combined sample* 

4.149. Let Vi < Y 2 < ^ - < Y 5 be the order statistics of a random sample 
of size 5 from a distribution that has the pAJ.J(x) = I t 0 < jc< I, zero 
elsewhere. Compute ¥r(Y t <|, K 5 > |). ? 

4J50. Let M(t) = (I - /)"\ / < I, be the m.g.f. of X Find the m.gX of 

Y — ^ — 10 

4.1 〒 1* Let ^ be the mean of a random sample of size n from a normal 
distribution with mean p and variance cr 2 = 64, Find n so that 

Pr (和 ~6<X<fi +6) = 0.9973. 

4J52, Find the probability of obtaining a total of 14 in one toss offour dice. 
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4, 153* Two independent random samples，each of size 6^ are taken from two 
normal distributions having common variance c 2 . If W x and W 2 are the 
variances of these respective samples, find the constant k such that 


Pr 


min (最，货) 


0 . 10 . 


4*154. The mean ancj, variance of 9 observations are 4 and 14, respectively. 

We find that a tenth observation equals 6. Find the mean and the variance 
of the 10 observations. 


4.155. Draw 15 cards at random and without replacement from a pack of 25 

cards numbered 1 ， 2, 3, • • • ， 25 - Find the probability that 10 is the median 
of the cards selected. 

4.156, Let Y } < Y 2 < Y 3 < V 4 be the order statistics of a random sample of 
size n = 4 from a uniform distribution over the interval (0, 1). 

⑻ Find the joint p.d.f, of F, and Y 4 . 

(b) Determine the conditional p.d,f of V 2 atid given Y t - y, and 

^4 = ^ 4 - 

(c) Find the joint p.d.f. of Z, = F t /y 4 and Z 2 = Y A , 

4.157* Let X x , X 2f •.. ， 尤 be a random sample from a distribution with 
mean ft and variance a 2 . Consider the second differences 

Zj = 2 ^ 2Xj + I 4- A}, y — I, 2^..., ^ — 2. 

ft ^2 ， 

Compute the variance of the average, £ Zjj{n - 2), of the second 
differences- 事 


4.158. Let X and Y have a bivariate normal distribution. Show that X + Y 
and X — Y are independent if and only if af = a]. 

4 

Let Z be a Poisson random variable with mean ^ If the conditional 
distribution of Y 7 given X ^ is b(x ， p). Show that Y has a Poisson 
distribution and is independent of X— 7. 

■ ■ 

4.160. Let A, …，毛 be a random sample from A^(^, a 2 ). Show that the 
?? m P* e mean Zand each X f — X1 ， 2” •. ， $ are independent. Actually 
X and the vector (X t — X y X 2 — X n — X) are independent and this 

implies that X and ^ — X) 2 are independent. Thus we could find the 

/ s ： I 

joint distribution of X and n^ja 2 using this result. 

4*161- Let X 2 ,.. *, X n be a random sample from a distribution with 

p,dX ™ I , jc = 1 ， 2, ‘ …， 6， zero elsewhere. Let Y — min (A^) and 

Z = max Say that the joint distribution function of Y and Z is 

2 ) — ^ (y ^ y> Z < z), where y and z are nonnegative integers such 
that l <y<z <6. ’ 
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(a) Show that 

G(y, z) = /^( 2 ) — [F(z) - F(y)Y 9 l <y<z <6, 

where F{x) is the distribution function associated with f(x). 

Hint: Note that the event (Z < z) — (Y < y, Z < z)U (y < Y,Z <z) 

(b) Find the joint p.d.f. of Y and Z by evaluating 

s(y^ = G(y s z) G(y I, z} — G(y^ z — 1) + G(y — 1， z — 1). 

4*162. Let X = (X u X 2i have a multivariate normal distribution with 
mean vector |i = (6, 一 2, I)' and covariance matrix 

. r i o - 1 1 

V = 0 2 i , 

一 1 1 3 

Find the joint p.d.f. of 

K, - 3X t + X 2 ^ 2X % and Y 2 ^X y - 5X 2 + X,. 

4.163* If 

If f !• 

1 P P~ 

V — p 1 p 

P P 1 

is a covariance matrix, what can be said about the value of p? 










CHAPTER 


ymking 

Distributions 


5.1 Convergence in Distribution 

In some of the preceding chapters it has been demonstrated by 
example that the distribution of a random variable (perhaps a statistic) 
oft^n depends upon a positive integer n. For example, if the random 
variable X is b(n, p), the distribution of X depends upon n. If X is the 
mean^f a random sample of size w from a distribution that is N(fi, (j 2 ), 
then X is itself N(fi ， a 2 jn) and the distribution of X depends upon n. If 
^ is the variance of this random sample from the normal distribution 
to which we have just referred, the random variable nS 2 /^ 2 is — I ), 
and so the distribution of this random variable depends upon n. 

We know from experience that the determination of the probability 
density function of a random variable can, upon occasion, present 
rather formidable computational difficulties. For example, if X is the 

mean of a random sample X u X 2 ^... y X„ from a distribution that has 
the following p.dX 
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f(x) — I, 0 < x < 1, 



then (Exercise 485) the of X is given by [M{tjn)] n , where here 


Hence 



E{e ^)= 



/# 0 , 


= 1 , / — 0 , 

Since the m.g,f of X depends upon n, the distribution of X depends 
upon n. It is true that various mathematical techniques can be used to 
determine the p.d.f of X for a fixed, but arbitrarily fixed, positive 
integer /i. But the p,dX is so complicated that few, if any 丄 of us would 
be interested in using it to compute probabilities about X. One of the 
purposes of this chapter is to provide ways of approximating, for large 
values of n, some of these complicated probability density functions, 
Consider a distribution that depends upon the positive integer n. 
Clearly, the distribution function F of that distribution will also 
depend upon??. Throughout this chapter, we denote this fact by writing 
the distribution function as F n and the corresponding p,d 丄 as /„• 
Moreover, to emphasize the fact that we are working with sequences 
of distribution functions and random variables, we place a subscript 
n on the random variables. For example, we shall write 


F„(x)= 



y/l/ny/^t 


e- nw2fl dw 


for the distribution functidh of the mean X n of a random sample of size 
n from a normal distribution with mean zero and variance L 

We now define convergence in distribution of a sequence of 
random variables, 1 ， ： ， 

V , 

^ 1 ! 

r * ■•- • ■ h 

Defimtion 1_ Let .the distribution function F n {y) of the random 
variable Y n depend upon n — 1 ， 2, 3,… "If F\y) is a distribution 
function and if Um F n (y) = F(y) for every point y at which F(y) is 
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continuous，then the sequence of random variables, 

converges in distribution to a random variable with distribution 

function 


The following examples are illustrative of this convergence in 
distribution* 

Example h Let Y n denote the nth order statistic of a random sample 
X u X 2l -.,, X n from a distribution having pAS. 

f(x) = ^， 0 < x < 0, 0 <6 < oo, 


0 elsewhere. 


The p.d-f. of Y n is 


gn{y) 


nf 




0 < y <6^ 


0 elsewhere. 


and the distribution function of Y„ is 

j K(y) = o, y<o, 


_y 


nr 


e n 


dz 


o 



n 


0<y <8, 


Then 


9 < y < oo 


lkn F n {y) - 0, 


oo < y < $， 


Now 


1， 9 < y < oo 


JF{y) = 0， —oo < y <0, 

4 

=1 ， 0 <J^ < 00, 

\ rt 

< 篡 

•ji' f 

is a distribution function. Moreover, lim F rt (y) = F(y) at each point of 

continuity of Ffj). Recall that a distribution of the discrete type which has 
a probability of 1 at a single point has been called a degenerate distribution. 
Thus, in this example, the sequence of the nth order statistics， Y„ f 
= I， 2 , 3 , ，" ， converges in distribution to a random variable that has a 
degenerate distribution a! the point y — 6. 
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Example 2. Let X n have the distribution function 

严 x * 


K(x) 




€ 


nw2/1 dw. 


If the change of variable v = y/nw is made，we have 

、扣 * 

K ⑺ =I —~e~ l ^ f2 dv, 

oo 


It is clear that 


lim F n (x) = 0 ， x <0, 

co 

=I, x >0. 

Now the function 

I 

JF(x) = 0, x < 0 ， 

=1 ， x > 0, - 

is a distribution function and lim F^(x) = F(x) at every point of continuity of 

n-*<X3 

F(x). To be sure，lim F(0) 7 but F(x) is not continuous at J = 0. 

? 8 «-*oo 

Accordingly, the sequence X } , X 2 ^ * • * converges in distribution to a 

random variable that has a degenerate distribution at 3c = 0, 

Example J. Even if a sequence , X 2 , X 3 , *. - converges in distribution to 
a random variable we cannot in general determine the distribution of Xby 
taking the limit of the p,d.f, of X n . This is illustrated by letting X n have the 
p,d,f. 


L(x) = U 


x 


2 + -, 


n 


. =0 elsewhere. 

Clearly, lim f n (x) = 0 for all values of x. This may suggest that X n , 

«-*QD 

n — 1，2, 3, • • ■ ， does not converge in distribution. However, the distribution 
function of X n is 

F rt (x) = 0 } x <2 +^， 
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and 

lim F„(x) = 0, x<l ， 

ir-^oo 

= 1 ， jc > 2 . 

Since 

= 0, x 2) 

=U ^ 2 : 2 , 

is a distribution function，and since lim F„(x) — F(x) at all points of 

If-^00 

continuity of the sequence X u X 2y .. converges in distribution to 
a random variable with distribution function F(x). 

It is interesting to note that although we refer to a sequence of 
random variables, X U X 29 X^ …， converging in distribution to a 
random variable JThaving some distribution function it is actually 
the distribution functions F } ,F 2 ,F 3 , . .. that converge. That is, 

lim F n (x) — F(x) 

rt-^oo 

at all points x for which F(x) k continuous. For that reason we often 
find it convenient to refer to F(x) as the limiting distribution• Moreover, 
it is then a little easier to say that X m representing the sequence 

不， Jf 2 , Jf 3 , • • . ， has a limiting distribution with distribution function 
F(x). Henceforth, we use this terminology. 

Example 4 Let Y n denote the nth order statistic of a random sample from 
the uniform distribution of Example L Let Z n = n(6 - Y n ). The p.dX of Z n 
is 




(8 - zjrif - 1 

¥ 


0 < z < n9. 


= 0 elsewhere, 

and the distribution function of Z n is 
G n (z) = 0, z < 0 ， 


# 






(0 ― wjnf 
— 0 "~ 


dw 



0 < z < nO, 


1 


n9 < z. 
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Hence 

Uni G„{z) — 0, z < 0 T 

rt—oo 

=1 — e— 2 ’\ 0 < 2 < oo. 


Now 

G{z) = 0, z < 0, 

=1 - e _ z 〜 0<z, 

is a distribution function that is everywhere continuous and lim G n (z) = G(z) 

n-*<oCr 

at all points. Thus Z n has a limiting distribution with distribution function 
G(z), This affords us an example of a limiting distribution that is not 
degenerate. 

Example 5. Let T n have a /-distribution with n degrees of freedom, 
n — 1 ， 2 , 3 , " " Thus its distribution function is 



^(0 - 



r[Qi + 1)/2] i 

( J +//W 2 


dy. 


where the integrand is the p,d.Lf n {y) of T n . Accordingly ， 

lim F n (t) = lim ! f n (y) dy 

I1-+CO 

r v . 

# 

f*/ . 

7 =i Hm My) dy. 

IT -* 00 

- V — oo 

The change of the order of the limit and integration is justified because \f n (y)\ 

is dominated by a function, like 10/!( 少 )， with a finite mtegraL That is, 

• - , > % 4 ' ■ 

• \f.(y)\ < WM 

and 


10 


I0/j(^) dy = — arctan f < oo 


♦ _ b i ■ 

for all real t. Hence, here we can find the limiting distribution by finding the 
limit of the p,dX of T tr It is 


lim f n (y) = iim 

PI-too «_+0 Q 


r [(打 + I )/2 】 

V^/2 W2) _ 

i 

X (V+ f /n^ 2 

m p B 
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Using the fact from elementary calculus that 



the limit associated with the third factor is clearly the d d f of the 

norma! distributioa. The second limit obviously 叫⑽匕 : If we knew 二 3 

about the gamma function, it is easy to show that the first limit also 
equals L Thus we have 

lim F n (t )= 

J — 

i » B - 

and hence T n has a limiting standard normal distribution. 

■i 

EXERCISES 




e 


- y ^/2 




5.1. Let denote the mean of a random sample of size n from a distribution 
that is N(p, a 2 ). Find the limiting distribution of 

* 

5.2. Let y! denote the first order statistic of a random sample of size « from 
a distribution that has the pASJ(x )= 厂卜 ' 0 < 太 < oo, zero elsewhere. 
Let = n(Y l - 6). Investigate the limiting distribution of Z n , 

5.3. Let Y h denote the nth order statistic of a random sample from a 
distribution of the continuous type that has distribution function F\x) 
and pAS.f(x) = F f (x). Find the limiting distribution of Z ft = n[l - f{Y n )l 

SA. Let Y 2 denote the second order statistic of a random sample of size n 
from a distribution of the continuous type that has distribution function 
Hx) and p-df./(jc) = F(x), Find the limiting distribution of W n = 


5-S* Let the p.d.T of Y n be 义 (f) = 1 3 少 =w，zero elsewhere. Show that does 
not have a limiting distribution. (In this case，the probability has “escaped ，， 
to infinity.) 


5*6* Let be a randoni sample of size /i from a distribution that 

； s where cr 2 > 0. Show that the sum Z n ^f Xi does not have a 

limiting distribution. i 


5.2 Convergence in Probability 

In the discussion concerning convergence in distribution, it 
was noted that it was really the sequence of distribution functions 
that converges to what we call a limiting distribution function. 
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Convergence in probability is quite different, although we demonstrate 
mat m a special case there is a relationship between the two concepts. 

Definition 2. A sequence of random variables X u X 2t X 3 ,... 
COnverges in Probability to a random variable X if, for every 6 > 0, 


limPr(MT n -n<e)-l 


or equivalently. 


lim Pr (\X n — ^>£) = 0, 


are usually interested in this convergence when the 
mnd? variable X is a constant^ that is, when the random variable X 

aS ^ e S en ^rate distribution at that constant. Hence we concentrate 
on that situation, 

a di^X^htthi rand ° m frt °I ^ ^ 

^ Ti ean " and positive variance a 2 . Then the mean and 

„ are ^ and a jn. Consider, for every fixed e > 0, the probability 




ka 

s/n, 


^ accor d ance with the inequality of Chebyshev, this 

probabjlity is less than or equal to 1 /^ ^ 0 沉 2 . So, for every fixed £ > 0, we 


lim Pr (\X n — fi\>e)<\im^^0 

w-oo ne 


Sr? 

numbers ^ ° Probability.) This result is called the weak law of large 


Remark. A stronger type of convergence is gi ven by Pr (lim Y n ^ c) — 1; 

Althou^ 5 ^ Yn： A ^ 2 " 3f # converges to c with probability L 

mean X 1 7°^ consider tilis type of convergence, it is known that the 

form of the strong law of large numbers. 

canwt^° V ^ & that relates a certain limiting distribution to 

convergence in probability to a constant. 
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Theorem 1* Let F n (y) denote the distribution function of a random 
variable Y n whose distribution depends upon the positive integer n. Let 
c denote a constant which does not depend upon n. The sequence 
n = 1，2, 3,…，， converges in probability to the constant c if and only if 
the limiting distribution of Y n is degenerate at y = c. 


Proof. First, assume that the lim Pr (\Y n — c\ < e) ^ 1 for every 

ft-t CO 

£ > 0, We are to prove that the random variable Y n is such that 


lim F n {y) ^ 0, 



y<c, 

y>c. 


Note that we do not need to know anything about the lim F n (c), For 

lt-»00 

if the limit of F„(y) is as indicated, then Y n has a limiting distribution 
with distribution function 

Hy) = 0, y<c. 



y>c. 


Now 

Pr (in -c\<€) = F n [(€ + £)-]-F„(c-£), 

where F n \(c + e) — ] is the left-hand limit of F n (y) at j — c + €. Thus 
we have 

1 = limPr(|r B -c|<e) = lim F n [(c + €) — ] 一 lim F n (c - e). 

n-*m «-kjo 

Because 0 < F n {y) < 1 for all values of y and for every positive integer 
it must be that 

lim F n {c — e) = 0, lim F n [(c + £) — ] = 1. 

U-#00 ff-frOO 

,Since this is true for every £ > 0, we have 

lim F n (y) = 0, y < c, 

=1 ， y> c, 

as we were required to show. 
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To complete the proof of Theorem 1， we assume that 

lim F n (y) ^ 0 5 y < c, 

n-^oo 

. =1 ， y> c. 

We are to prove that lim Pr (j Y n — c\<e) = l for every e > 0, Because 

P r (I ~ c l < t) = F n [{c + e) — ] — F„(c — e), 
and because it is given that 

lim F n {(c + e)-]- 1, 

lim F n {c — e) = 0 ， * 

n-froo 

for every £ > 0, we have the desired result. This completes the proof 
of the theorem, 

fT" 

For convenience, in the notation of Theorem 1 ， we sometimes say 
that Y n , rather than the sequence Y x , Y 2 , Y 3 ,..., converges in 
probability to the constant c, 

^ m ** 4 

公 V 

EXERCISES 

5.7. Let the random variable Y„ have a distribution that is b(n, p). 

(A) Prove that YJn converges in probability to p t This result is one form 
of the weak law of large numbers. 

(b) Prove that 1 — Y n /n converges in probability to ] — 

* .靡 

5.8. Let Si denote the variance of a random sample of size n from a 
distribution that is Nijx ， <r 2 ). Prove that nS 2 J{n — 1) converges in probability 
to a 2 . 

5.9. Let W n denote a random variable with mean fi and variance b/n p , where 
/? > 0, fi 7 and b are constants (not functions of/r). Prove that W n converges 
in probability to fi. 

. Hint: Use Chebyshev's inequality. 

5-10, Let Y n denote the nth order statistic of a random sample of size n from 
a uniform distribution on the interval (0, 9% as in Example 1 of Section 5 丄 
Prove that Z n — ^J~Y n converges in probability to 
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5.3 Limiting Moment-Generating Functions 

To find the limiting distribution function of a random variable Y n 
by use of the definition of limiting distribution function obviously 
requires that we know F n (y) for each positive integer n. But, as 
indicated in the introductory remarks of Section 5.1，this is precisely 
the problem we should like to avoid. If it exists，the moment-generating 
function that corresponds. to the distribution function F n (y) often 
provides a convenient method of determining the limiting distribution 
function. To emphasize that the distribution of a random variable Y n 
depends upon the positive integer n, in this chapter we shall write the 
moment-generating function of Y n in the form M(t; ri). 

The following theorem, which is essentially Curtiss’ modification 
of a theorem of Levy and Cramer, explains how the moment-generat¬ 
ing function may be used in problems of limiting distributions, A proof 
of the theorem requires a knowledge of that same facet of analysis that 
permitted us to assert that a moment-generating function，when it 
exists, uniquely determines a distribution. Accordingly ， no proof of the 
theorem will be given* . 


Theorem 2. Let the random variable Y n have the distribution 
function F n (y) and the moment-generating function M(t; n) that exists 
for —h<t<h for all n. If there exists a distribution function F{y% 
with corresponding moment-generating function M(t\ defined for 
|?| < h x < h, such that lim M(t; n) = M(t\ then Y n has a limiting 

distribution with distribution function F(y), 

In this and the subsequent sections are several illustrations of the 
use of Theorem 2, In some of these examples it is convenient to use a 
certain limit that is established in some courses in advanced calculus* 
We refer to a limit of the form 



where b and c do not depend upon n and where lim \(/(n) = 0. Then 


lim 1 




n 


— lim 
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For example, 





—n/2 

* 


Here b = _ z 3 , c ; —|，and ^(n) = t l l^/n. Accordingly, for every fixed 
value of t, the limit is e f2/2 . 


Example /. Let Y n have a distribution that is b(n, p). Suppose that the 
mean fi — np is the same for every n; that is, p = fijn, where is a constant. 
We shall find the limiting distribution of the binomial distribution, when 
p = fi/n, by finding the limit of M(t; n). Now , 

M{t\ n) = £(^^) = [(1 —p) + pe f f = 1 +-- - - 

mr 

for all real values of Hence we have 

lim M{t\ n) — ^ n 

ff-fCO 

for all real values of /. Since there exists a distribution，namely the Poisson 
distribution with mean /i, that has this m.gX then, in accordance with 

the theorem and under the conditions stated, it is seen that Y n has a limiting 
Poisson distribution with mean fi. 

Whenever a random variable has a limiting distribution, we may, if we 
wish, use the limiting distribution as an approximation to the exact 
distribution function. The result of this example enables us to use the Poisson 
distribution as an approximation to the binomial distribution when n is large 
and p is small. This is clearly an advantage, for it is easy to provide tables for 
the one-parameter Poisson distribution. On the other hand, the binomial 
distribution has two parameters, and tables for this distributio|i are very 
ungainly. To illustrate the use of the approximation, let Y have a binomial 
distribution with n — 50 and ^ , Then 

Pr(r< I) = (|f) 50 + 50( 去 )(H) 49 = 0.400 ， 

approximately. Since 只 = 即 = 2, the Poisson approximation to this prob¬ 
ability is 

e， 2 + 2e~ 2 = 0,406. 

Example 2* LetZ^be % 2 ⑻- Then the m‘gX ofZ n is (1 — 2t)— 2 , f < !• The 
mean and the variance of Z n are, respectively，n and 2n. The limiting 
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distribution of the random variable Y n 
Now the of Y„ is 


(Z„ — n)/^/2n will be investigated 


M(t; n) — E< exp 


_ n 

s/2n 





e - 你 /s/S £^ e rZ nlJ^^ 


exp 




-/r/2 


1 




< 




2 


j 


This may be written in the form 

t ly/2/n — , ^^2 知 ) 


M{t; n) 


< 



In accordance with Taylor's formula，there exists a number ^(n) s between 0 
and ty/2(n, such that 


= 1 + ? 




e 


m 


6 


(‘揭 


If this sum is substituted for in the last expression for M(t; «), it is seen 
that 


M(t; n) 


t 1 1 物 y 

響豳 j. ii, ■ ■ j ■ 1 

n n 


- rt /2 


where 


Hn) 




3n 


\f n yfi 

Since ^(n)-^0 as n^co, then lim ij/(n) - 0 for every fixed value of t. In 
accordance with the limit proposition cited earlier in this section, we have 

lim M(t; n) — e tlf2 




for all real values of t. That is, the random variable Y n — (Z n — nj/^/ln has 
a limiting standard normal distribution. 


EXERCISES 

5.11* Let have a gamma distribution with parameter and j?, where 
办 is not a function of n. Let Y„ = XJn. Find the limiting distribution of Y n . 

5,12* Let be j^(n) and let W n — ZJn 2 t Find the limiting distribution of tV„. 
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5J3, Let X be jf 2 (50). Approximate Pr (40 < X < 60). 

5A4, Let/? = 0.95 be the probability that a man，in a certain age group, lives 
at least 5 years. ' 

(a) If we are to observe 60 such men and if we assume independence, find 
the probability that at least 56 of them live 5 or more years. 

(b) Find an approximation to the result of part (a) by using the Poisson 
.distribution. 

Hint: Redefine p to be 0.05 and 1 —p = 0*95. , 

5*15* Let the random variable Z n have a Poisson distribution with par- 
ameter fi — n. Show that the limiting distribution of the random variable 
Y n = {Z n - n)l^/n Is normal with mean zero and variance 1. 

5*16. Let S 2 n denote the variance of a random sample of size n from a 
distribution that is cr 2 ). It has been proved that nS 2 Jin — 1) converges 
in probability to a 2 . Prove that S 2 n converges in probability to a 2 . 

5.17, Let X n and Y„ have a bivariate normal distribution with parameters /x t , 
fh ， 仃 <4 (free of n) but p = I — l/n. Consider the conditional distribution 
ofY n , given X„ — x. I nvestigate the limit of this conditional distribution as 

What is the limiting distribution if p = — 1 + J/«? Reference to 
these facts was made in the Remark in Section 2.3. 

5.18. Let X n denote the mean of a random sample of size n from a Poisson 
’ distribution with parameter /i = L 

(a) Show that the mug.f. of = sfn{X n — — ^/n(X„ — I) is given by 

exp [-ty/n + n(e rf ^ - 1)1 

> (b) Investigate the limiting distribution of Y n as n-^co. 

Hint: Replace, by its MacLaurin’s series, the expression e tj ^\ which is 
in the exponent of the moment-generating function of 

5.19 - Let JC, denote the mean of a random sample of size « from a distribution 
that has p.dX /(x) — 0 < jc < oo, zero elsewhere. 

(a) Show that the m,gX M{t;n) of Y n — s/n(X h — !) is equal to 

(b) Find the limiting distribution of Y n 3 ls 

This exercise and the immediately preceding one are special instances 
of an important theorem that will be proved in the next section, 

t j • y 

5.4 The Central Limit Theorem 

It was seen (Section 4,8) that，if X 〖，尤 2 , •…， is a random sample 
from a normal distribution with mean /u and variance a 2 , the random 
variable 


n 


I 




a 








Sec, 5.4| The Central Limit Theorem 


247 


is，for every positive integer n, normally distributed with zero mean and 
unit variance. In probability theory there is a very elegant theorem 
called the central limit theorem, A special case of this theorem asserts 
the remarkable and important fact that if X u X 2 , …， 毛 denote the 
observations of a random sample of size/? from any distribution having 
positive variance a 1 (and hence finite mean //)，then the random variable 

V 打 d — /i)/cr has a limiting standard normal distribution. If this 
fact can be established, it will imply，whenever the conditions of the 
theorem are satisfied, that (for large n) the random variable 

V ~ 卜 has an approximate nonnal distribution with mean zero 
and variance 1, It will then be possible to use this approximate normal 
distribution to compute approximate probabilities concerning X. 

The more general form of the theorem is stated, but it is proved only 
in the modified case* However, this is exactly the proof of the theorem 
that would be given if we could use the characteristic function in place 
of the m.gS, 


Theorem 3, Let X] 5 X 2 ^.. • ， denote the observations of a random 
sample from a distribution that has mean fi and positive variance a 1 . Then 

= y/n(X n — ii)ju has a 


the random variable Y n = ^ Z v — nfi 
limiting distribution that is normal with mean zero and variance L 



Proof. In the modification of the proof，we assume the existence of 
the m.gX M{t) = E(e fX ), —h < t < h，of the distribution* However, 
this proof is essentially the same one that would be given for this 
theorem in a more advanced course by replacing the m,g,f, by the 
characteristic function <p(t) = E(e i,x ). 

The function 


m{t) = E[e f{X ^^] = 

also exists for —k < t < k. Since m{t) is the m,g.f. for X — ^ it 
must follow that m(0) = I, m’(0) = E(X ― fi) = 0, and m"(0)= 
t\(X — ") 2 ]= 疗气 By Taylor’s formula there exists a number ^ between 
0 and t such that f ， 

m{t) — m(0) + m\G)t H - ^ —— 

、 丄爪 "(o , 2 
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If a 2 t 2 /2 is added and subtracted, then 

w (㈣ 員 ) 

Next consider M(t; h), where 




2 



⑴ 



In Equation (1) replace t by tjo^/n to obtain 

t 1 [w"(0 — cr 2 ]/ 2 
2n 2m 2 ^ 

where now ^ is between 0 and tja^/n with —ha^/n < t < hcr^/n. 
Accordingly ， 

M(t; n) ^ 

Since nf(t) is continuous at f — 0 and since ^0 as n^cc, we have 

lim [m f, (0 - a 2 ] - 0, 

17—00 

The limit proposition cited in Section 5*3 shows that 

• , lim M(t; n) = e f2/1 

/! 一 ^QO 

for ajl real values of /■ This proves that the random variable Y n = 
^[k{K n — /i)/d has a limiting standard normal distribution. 



f 2 • [m%0 - aY 


n 


In 


Ina 1 



We interpret this theorem as saying that, when w is a large, fixed 
positive integer, the random variable X has an approximate normal 
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distribution with mean and variance a 2 (n; and in applications we use 
the approximate normal p dX as though it were the exact p.d.f. of X. 

Some illustrative examples, here and later，will help show the 
importance of this version of the central limit theorem. 

Example 1, Let X denote the mean of a random sample of size 75 from 
the distribution that has the p,d,f 

fix) = 0 < 文 < i, 

= 0 elsewhere. 

It was stated in Section 5A that the exact of X, say is rather 
complicated. It can be shown that g(x) has a graph when 0 < 3c < 1 that is 
composed of arcs of 75 different polynomials of degree 74. The computation 
of such a probability as Pr(0.45 < A" < 0-55) would be extremely laborious* 
The conditions of the theorem are satisfied，since M{t) exists for all real values 
of /* Moreover, (i — \ and cr 2 = 舍 ， so that we have approximately 

f n「\/^( 0 4 5 — W Jn(X - fi) ^(0,55 — 釤厂 

Pr (0.45 < X < 0.55) = Pr - - -- < … •- — - < ---- 

; a a a 

=Pr [- L5 < 30(X _ 0.5) < 1.5] 

= 0-866， 

from Table Ilf in Appendix B, 

Example Z Let X u X 2 ^ ^ r X n denote a random sample from a 
distribution that is b{\,p). Here fi = p, a 2 = p(l — p% and M{t) exists for 
all real values of If K„ = $ + … + it is known that Y n is b(n f p). 
Calculation of probabilities concerning when we do not use the Poisson 

approximation，can be greatly simplified by making use of the fact that 

(L — np)!^/np{\ -p) = ^/n{X n - p)jjp{\ 二尸 ) = y/n(X n - fi)!a has a lim¬ 
iting distribution that is normal with mean zero and variance 1, Frequently, 
statisticians say that Y ai or more simply Y, has an approximate normal 
distribution with mean np and variance np(l — p). Even with n as small as 10, 
with p = \so that the binomial distribution is symmetric about np = 5, we 
note in Figure 5. 1 how well the normal distribution, N(S t |) s fits the binomial 
distribution, 6(10, 士 )， where the heights of the rectangles represent the 
probabilities of the respective integers 0, 1 ， 2,… ， 10. Note that the area of 
the rectangle whose base is (k — 0.5, k + 0.5) and the area under the normal 
p.dX between k — 0*5 and k + 0.5 are approximately equal for each 
众 = 0, 1 ， 2, … ， 10, even with n — 10. This example should help the reader 
understand Example 3 - , ^ 
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FIGURE 51 

ir 4 

Example 3. With the background of Example 2, let n = 100 and p = 
and suppose that we wish to compute Pr (F = 48,49, 50, 51, 52), Since Y is 
a random variable of the discrete type，the events Y — 48, 49, 50, 51, 52 
and 47,5 < Y < 52.5 are equivalent. That is ， Pr (K = 48, 49, 50, 51 ， 52) = 
Pr (47.5 < Y < 52-5), Since np = 50 and np(l — p) — 25, the latter prob¬ 
ability may be written 

Pr (47.5 < Y< 52.5) = Pr ( 打 〜 - 50 < Hr 50 < 52.〜 - 50 ) 

=Pr 卜 0.5 < K ~ 5Q < 0.5). 

■ 

Since (Y — 50)/5 has an approximate normal distribution with mean zero and 
variance 1， Table III shows this probability to be approximately 0.382. 

The convention of selecting the event 47.5 < Y < 52.5, instead of, say, 
47.8 < Y < 52.3, as the event equivalent to the event Y = 48, 49 t 50, 51 ^ 52 
seems to have originated in the following manner: The probability, 
Pr(Y = 4B, 49, 50, 5 1 ， 52)，can be interpreted as the sum of five rectangular 
areas where the rectangles have bases i but the heights are ， respectively ， 
¥r (Y = 48)，■,,, Pr (K = 52). If these rectangles are so located that the 
midpoints of their bases are ， respectively, at the points 48, 49,. 52 on a 

horizontal axis, then in approximating the sum of these areas by an area 
bounded by the horizontal axis, the graph of a normal p.d.f” and two 
ordinates，it seems reasonable to take the two ordinates at the points 47.5 and 
52.5. 

一 n 

We know that 无 and [ 不 have approximate normal distributions, 

/w I 1 • I , 

provided that n is large enough- Later, we find that other statistics 
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also have approximate normal distributions, and this is the reason that 
the normal distribution is so important to statisticians. That is ， while 
not many underlying distributions are nomial’ the distributions 
of statistics calculated from random samples arising from these 
distributions are often very close to being normal. 

Frequently, we are interested in functions of statistics that have 
approximate normal distributions- For illustration, Y n of Example 2 
has an approximate N[np, np( 1 —/?)]. So np{\ — p) is an important 
function of/? as it is the variance of Thus, if/? is unknown, we might 
want to estimate the variance of Y n , Since E{YJn) = p, we might use 
沒 — Y n jn) as such an estimator and would want to know 
something about the latter’s distribution. In particular，does it also 
have an approximate normal distribution? If so, what are its mean and 
variance? To answer questions like these, we use a procedure that is 
commonly called the delta method ，which will be explained using the 
sample mean X n as the statistic. 

We know that X n converges in probability to ^ and X n is 
approximately NQi ， <r 2 /n). Suppose_that we are interested in a function 
of X nj say u(X n ), Since, for large n, X n is close to ^ we can approximate 
u(X n ) by the first two terms of Taylor’s expansion about namely 

w ( 尤 )a v(X n ) - u(jx) + (X n — 

where exists and is not zero* Since v(X n ) is a linear function of X ny 
it has an approximate normal distribution with mean 

五 ㈣ 足 )] = + E[{X n - ")]〆(//) = »(>) 

and variance 

i- 

var [v{X n )] = [u'iii)] 2 var (X„ -/i) = [u r (fi)f ^ . 

:-考 * _ * * _ * 

Now, for large n y u(X n ) is approximately equal to v(X n ); so it has the 
same approximating distribution. That is ， u{X n ) is approximately 
N{u(jx), More formally, we could say that 

- U{ll) * 

• •• • •. vf ⑻] v/« ••- 

has a limiting standard normal distribution. 

* , » 

* - m. \ ^ A • * m m * 

. ， .辱 •嘗 * ^ 

Example 4. Let Y„ (or Y for simplicity) be p). Thus Yjn is approxi¬ 
mately N[p ， p(l — p)/n)]: Statisticians often look for functions of statistics 
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whose variances do not depend upon the parameter. Here the variance of Yjn 
depends upon p. Can we find a function，say u( Yjn\ whose variance is 
essentially free of pi Since Yjn converges in probability to p, we 

can approximate u{Yjn) by the first two terms of its Taylor’s expansion about 
p, namely by 

^ = Kp) + - p\\p). 


Of course ， v( Yjn) is a linear function ojf Yjn and thus also has an approximate 
normal distribution; clearly»it has mean u(p) and variance 


W{p)f 


p(i—p) 


n 


But it is the latter that we want to be essentially free of /?; thus we set it equal 
to a constant，obtaining the differential equation 


u\p )= -—— . . 

VpO^p) ' 

A solution of this is 

u{p) — (2c)arcsin y/p. 

A 

If we take c — we have，since w( Yjn) is approximately equal to v( Yjn), that 



丁 his has an approximate normal distribution with mean arcsin s/p and 
variance 1 /4«，which is free of p. 


EXERCISES 

5.20» Let X denote the mean of a random sample of size 100 from a distri- 
bution that is jf 2 (50). Compute an approximate value of Pr (49 < X < 51). 

•Mm 

5.21, Let X denote the mean of a random sample of size 128jTrom a gamma 
distribution with a = 2 and 彡 = 4. Approximate Pr (7 < Z < 9), 

鲁 

5,22* Let Y be A(72, 士 ). Approximate Pr (22 < Y < 28)- 

5_23. Compute an approximate probability that the mean of a random sample 
of size 15 from a distribution having p.d.f. f(x) = 3JC 2 , 0 < x < I, zero 
elsewhere, is between | and 

5*24. Let Y denote the sum of the observations of a random sample of size 
12 from a distribution having p.d t f(x) = x = 1 ， 2, 3, 4, 5, 6, zero 
elsewhere. Compute an approximate value of Pr (36 < y < 48 )， 
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Hint: Since the event of interest is K = 36, 37, ... ， 48, rewrite the 
probability as Pr (35.5 < Y < 48,5). 

5.25. Let Yhc 6(400, \). Compute an approximate value of Pr (0.25 < Yjn), 

5.26. If 7 is 6(100,!), approximate the value of Pr (F = 50). 

5.27. Let Ybe b(n, 0.55). Find the smallest value of n so that (approximately) 
Pr()7n>i)>0.95. 

5.28. Let/(x) = 1/x 2 , 1 < x < oo, zero elsewhere, be the p.dX of a random 
variable X. Consider a random sample of size 72 from the distribution 
having this p.d.f. Compute approximately the probability that more than 
50 of the observations of the random sample are less than 3. 

5.29. Forty-eight measurements are recorded to several decimal places. Each 
of these 48 numbers is rounded off to the nearest integer. The sum of the 
original 48 numbers is approximated by the sum of these integers. If we 
assume that the errors made by rounding off are U.d* and have uniform 
distributions over the interval (—j，!)，compute approximately the 
probability that the sum of the integers is within 2 units of the true sum, 

5.30. We know that X is approximately N(/a, ^jn) for large n. Find the 
approximate distribution of u(X) = X 1 . 

531* Let H " •, 先 be a random sample from a Poisson distribution 

IT 

with mean ft. Thus Y — Y, has a Poisson distribution with mean nfi. 

! . 

Moreover, X = Yjn is approximately iV(/x, pt/n) for large n. Show that 
u{Yjn) — ^JYjn is a function of Yjn whose variance is essentially free of ft, 

5.5 Some Theorems on Limiting Distributions 

v 

In this section，we shall present some theorems that can often be 
used to simplify the study of certain limiting distributions. 

Theorem 4. Let F n (u) denote the distribution function of a random 
variable U n whose distribution depends upon the positive integer n. Let 
U n converge in probability to the constant c ^ 0* The random variable 
U n jc converges in probability to 1. 

■ , n 

The proof of this theorem is very easy and is left as an exercise, 

■ » 

Theorem 5* Let F n (u) denote the distribution function of a random 
variable U n whose distribution depends upon the positive integer n. 
Further, let U n converge in probability to the positive constant c and let 
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Pr (C4 < 0) = 0 for every n. The random variable converges in 
probability to 

Proof. We are given that the lim Pr QU, -c\>e) = 0 for every 
e > 0. 

We are to prove that the lim Pr (\Ju n - J~c\> e') = 0 for every 
6’ > 0 - Now the probability 00 


Pr (1^4 - 叶 2 0 = Pr + ^/c)\ > e] 



If we let 〆 
have 



= £ /\/^，and if we take the limit, 


as n becomes infinite, 


we 


0 = 】im Pr (队 一 e| >e)> lim Pr (\Ju, - Jl\ >^)^0 
for every 〆> 0. This completes the proof, 

_ ft. ■ >•. 

* 羼 ■■ + 

4 

-conclusions of Theorems 4 and 5 are very natural ones and 
&: they certainly appeal to our intuition* There are many other theorems 
of this flavor in probability theory. As exercises, it is to be shown that 
if the random variables U n and V n converge in probability to the 
respective constants c and d y then U n V n converges in probability to the 
constant cd, and UjV n converges in probability to the constant c\d y 
provided that However，we shall accept, without proof, the 
following theorem, which is a modification of Slutsky’s theorem. 

Theorem 6. Let F n (u) denote the distribution function of a random 
variable U n whose distribution depends upon the positive integer n. Let 
队 have a limiting distribution with distribution function F(u). Let a 
random variable V n converge in probability to 1. The limiting distribution 
random variable W n - U n jV n isthe same as that of U n ; that is, W n 
has a limiting distribution with distribution function F(w). 

Example L Let Y n denote a random variable that is b(n, p\0 < p < \ We 
know that 


¥ 


y n — np 


^Jnp{\ -p) 
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has a limiting distribution that is N(0, 1), Moreover, it has been proved that 
YJn and l — YJn converge in probability to p and I "- p, respectively; thus 
(Y H /n)(l — YJn) converges in probability to p{\ — p). Then, by Theorem 4, 
(K//i)(l — YJn)/[p( 1 一 /?)] converges in probability to I, and Theorem 5 
asserts that the following does also: 


K 


- YJn) 
^ p(l-p) 


i/2 


Thus, in accordance with Theorem 6, the ratio W n — U n jV^ namely 

K 一 n P 


铽 YJn)(l _ YJn)’ 

* p, 

has a limiting distribution that is JV(0,1). This fact enables us to write (with 
n a large, fixed positive integer) 


Pr 


2 < 


Y — np 


y/n{Yjn){\ - Y/n) 


<2 


0.954, 


approximately. 


Example 2. Let X n and Si denote, respectively, the mean and the variance 
of a random sample of size n from a distribution that is 7V(/x, ct 2 ), a 2 > 0, It 
has been proved that X n converges in probability to ft and that S 2 n converges 
in probability to <r 2 . Theorem 5 asserts that S„ converges in probability to a 
and Theorem 4 tells us that SJu converges in probability to 1, In accordance 
with Theorem 6, the random variable W n — <rXJS„ has the same limiting 
distribution as does X„. That is, <7X n /S n converges in probability to fi. t 


EXERCISES 


5,32, Prove Theorem 4, 

Hint: Note that Pr (\UJc — 11 < e) = Pr (\U n — c\ < for every 
€ > 0. Then take e f — e|c|- , 

533. Let X„ denote the mean of a random sample of size n from a gamma 
distribution with parameters a — /i_> 0 and ^ = L Show that the limit¬ 
ing distribution of ^/n( X„ — is N(0, I), 

5*34, Let T n — (X n — fx)/y/Sl/(n — 1), where X n and S 2 n represent, respect¬ 
ively, the mean and the variance of a random sample of size n from a 
distribution that is N(fi, a 2 ). Prove that the limiting distribution of T n 
is N(0, 1). 

5.35, Let X, , * -. 5 X„ and , ■,., Y n be the observations of two independent 
random samples，each of size n, from the distributions that have the 
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respective means ^ and fi 2 and the common variance a\ Find the limiting 
distribution of 


^ Y n ) - - fi 2 ) 



where X n and are the respective means of the samples. 

Hint: Let where Z, = % — y— 

\ 

5.36 Let U n and V tt converge in probability to c and d y respectively. Prove the 
following, 

(a) The sum U n + V n converges in probability to c + ^ 

_ : Show that Pr (|t/, + V n - c - d\ > e) < Pr (\U n ^ c \ + \V n -d\ 
- Pr j lU ^ or \V n ^d\> e/2) < Pr (\U n - c\ > e/2) + 

ri(\v n ~d\ > e/2), 

(b) The product U n V n converges in probability to cd 

(c) If the ratio V h lV n converges in probability to c/d 

537. Let U n converge in probability to c. If h(u) is a continuous function at 
u = c ， P fove that A(C/ rt ) converges in probability to h(c). 

Hint: For each e > 0 5 there exists a ^ > 0 such that Pr \\h(U )- 
A(e)| < > Pr [| U„ — c| < S]. Why? n 

ADDITIONAL EXERCISES 

5.38. A nail maoufacturer guarantees that not more than one nail in a box 
of iOO nails is defective. If, in fact, the probability of each individual nail 
being defective is p — 0-005, compute the probability that: 

(a) The next box of nails violates the guarantee. Use the Poisson 
approximation, after assuming independence. 

(b) The guarantee is violated at least once in the next 25 boxes. 

5,39* Let X n and be the means of two independent random samples of 
size f from a distribution having variance o 2 . Determine n so that 
P f (!^ — yj ^ ^/2) = 0.98, approximately. 

5.40. Let H … ， JT 25 be a random sample from a distribution with p,d ， f. 

/(X) = 6x(1 — jc), 0 < x < 1， zero elsewhere. Find Pr [0.48 < jp < 0.521 
approximately. " ■ 

5.41. j rolls an unbiased die 100 independent times and B rolls an unbiased 
die 100 independent times. What is the approximate probability that A will 
total at least 25 points more than 5? 

5_42, Compute ， approximately, the probability that the sum of the 
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observations of a random sample of size 24 from a chi-square distribution 
with 3 degrees of freedom is between 70 and 80* 

5,43< Let X be the number of times that nd heads appear on two coins when 
these two coins are tossed together n times. Find the smallest value of n so 
that Pr (0.24 < Xfn < 0*26) > 0.954, approximately. 

5.44. Two persons have 16 and 32 dollars, respectively. They bet one dollar 
on each of 900 independent tosses of an unbiased coin. What is an 
approximation to the probability that neither person is in debt at the end 
of the 900 trials? 

5.45* A die is rolled 720 independent times. Compute, approximately, the 
probability that the number of fives that appear will be between 110 and 
125 inclusive. 

5*46. A part is produced with a mean of 6.2 ounces and a standard deviation 
of 0.2 ounce. What is the probability that the weight of 100 such items is 
between 616 and 624 ounces? 

5-47. Let 卜 .…， ^25 be a random sample of size 25 from a distribution 
having p;dX/(x) = x/6, x — U 2, 3, zero elsewhere. Approximate 

/ 25 

Pr £ X-50,51,..., or 60 

V= i 

5,48* Say that a lot of 1000 items contains 20 defective items. A sample of 
size 50 is taken at random and without replacement from the lot. If 3 or 
fewer defective items are found in this sample，the lot is accepted. 
Approximate the probability of accepting this lot. 

5.49, Let X 2 ^ ^ , X n bca random sample from a distribution having finite 

E{X m ), m > 0. Show that [ X?jn converges in probability to E{X m ). Was 

/ = i 

an additional assumption needed? 

5*50. It can be proved that the mean X„ of a random sample of size n from 
a^Cauchy distribution has that same Cauchy distribution for every n. Thus 
X n does not converge in probability to zero. How can this be, as earlier ， 
under certain conditions, we proved that X r conv^ges in probability to the 
mean of the distribution? 

5.51. Let Y be jf 2 ⑻. What is the limiting distribution of Z = ^/y — ^/n? 

5*52. Let X be the mean of a random sample of size n from a Poisson 
distribution with parameter ju. Find the function Y - u(X) so that Y has an 
approximate normal distribution with mean w(/i) and variance that is free 
of tt. 
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5.53. Let < 5^2 be the Or^cr statistics of* h. i'3odooi ssnxp 里 e 

X r ， X 2f . of size n from a distribution with distribution function F(x) 
and p.df. f(x) — F(x). Say F(^ p ) = p and /(^) > 0* Consider the order 
statistic Y m ，where [np] is the greatest integer in np, 

(a) Note that the event yfn{ Y [np] — ^ p ) <u is equivalent to Z 乏 [np], where 

Z is the number of X-values less than or equal to ^ + uj^/n. 

(b) Write Z > np, an approximation to Z > [np], as 

nF ^ P + wh -/(o . ▲ ■ 

- r — > / — r - , approximately, 

\/np(t -p) y/p(\ —p 、 

using F(^ p + u/ % /n)mp ^-fiQujJn. 

(c) Since the left-hand member of the inequality in part (b) is 
approximately 7V(0,1), argue that 7 M has an approximate normal 
distribution with mean ^ and variance p(l — p)!n[f{^ p )f. 
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6.1 Point Estimation 

# 

The first five chapters of this book deal with certain concepts 
and problems of probability theory. Throughput we have carefully Orvla ^ 
distinguished between a sample space^ofout&mies and the spaced 聲長弓 
of one or more random variables defined on 贫 , Mth thk-ehaptei: we 
begin a study of some problems in statistics and here we are more 
interested in the number (or numbers) by which an outcome is 
represented than we are in the outcome itself. Accordingly, we shall 
adopt a frequently used convention. We shall refer to a random 别 

variable X as the outcome of a random experiment and we shall refer ; 錢 
to the space of X as the sample space. Were it not we 

would call X the numerical outcome. Once the experiment has been V . f j 
performed and it is found that X = x, we shall call jc the experimental 
value of 血 anant^ot 血 ex 卻 ,n 


259 




















260 




Introductim to Statistical Inference ICh, 6 


This convenient terminology can be used to advantage in more 
t ^ j j 》 J general situations. To illustrate this, let a random experiment be 
j ㈣ 命 repeated n independent times and under identical conditions. Then 
1 ’ ’ 1 the random variables X f ， X 2 ，.. • ，尤 (each of which assigns a 

numerical value to an outcome) constitute (Section 4.1) the 
observations of a random sample. If wclare more concerned with the 
numerical representations of the outcomes than with the outcomes 
themselves, it seems natural to refer to j as the outcomes. 

And what more appropriate name can we give to the space of a random 
sample than the sample space? Once the experiment has been 
performed the indicated number of times and it is found that X\ = 文 I ， 
— jc w , we shall refer to x { , x 2 , •. , ， as the 

experimental values of X u X 2 ^ , X n or as the sample data. 

We shall use the terminology of the two preceding paragraphs, and 
in this section we shall give some examples of statistical inference. These 
examples will be built around the notion of a point estimate of an 
unknown parameter in a p-d’f. 

Let a random variable X have a p.d.f, that is of known functional 
form but in which the p.d,f, depends upon an unknown parameter 0 
that may have any value in a set Q. This will be denoted by writing the 
p.d,f, in the form f(x; 6\ 6eQ. The set Q will be called the parameter 
space. Thus we are confronted, not with one distribution of probability, 
but with a family of distributions. To each value of 6 e O, there 
corresponds one member of the family. A family of probability density 
functions will be denoted by the symbol {f(x; $) : 6 e il}. Any member 
of this family of probability density functions will be denoted by the 
symbol f(x; 6\ OgQ. We shall continue to use the special symbols that 
have been adopted for the normal, the chi-square, and the binomial 
distributions. We may，for instance, have the family {N(9, l):6e n}, 
where Q is the set ~oo < 8 < ao. One member of this family of 
distributions is the distribution that is N(0, 1). Any arbitrary member 
is N(0 ， 1)，~oo < 0 < oo. 

Let us consider a family of probability density functions 
{f(x; 0): 6 eQ}. It may be that the experimenter needs to select 
precisely one member of the family as being the p ， d.f. of his random 
variable. That h y he needs a point estimate of 0. Let 不， X 2 , …，尤 
denote a random sample from a distribution that has a p+d,f, which is 
one member (but which member we do not know) of the family 
{/(x; 6): 8 e O} of probability density functions. That is, our sample 
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arises from a distribution that has the p,df. f(x; 0):6 € Q. Our 

problem is that of defining a statistic = u x {X u X 2 , •" ， 兄 )， so 

that if x l9 x 2 , : ^,x n are the observed experimental values of 

X U X 2 , ^ y X ny then the number= u y {x x , x 2 , _ ^x rt ) will be a good 

point estimate of Q. 

The following illustration should help motivate one principle that 
is often used in finding point estimates* 

Example L Let X u X 2 ,denote a random sample from the 
distribution with p,d*f. 

■« 

f(x) r(i 一 0) 1 乂 jc = o ， i ， 


= 0 elsewhere, 

= 以 The _ 卿 hatis the 


^(1 - 9) ] ^ Xl d X2 (l - 6) { - A . ‘ . 0X n Q — 0)1 = 一 Qf ^ 2 

where x f equals zero or 1, / = 2, . •. ， /l This probability, which is the joint 

p.d.f. of X l9 X 2y •…， 尤 ， may be regarded as a function of 0 and, when so 
regarded，is denoted by L(6) and called the likelihood function. That is, 

L{Q) = 8 ZXi (l — 0) n - 艺 ' 0 < 0 < L 

f ■' * 

> We might ask what value of 6 would maximize the probability L(0) of 
obtaining .this particular observed sample x 2t … ， x n . Certainly, this 
maximizing value of 0 would seemingly be a good estimate of 0 because it 
would provide the largest probability of this particular sample. Since the 
likelihood function L{6) and its logarithm，In L{0% are maximized for the same 
value 6, either L(6) or In 增 can be used. Here 

so we have 


In L(0) = ^ \ In 0 + 


d In L(6) 卜 

— = 了一卜 0 =0 ， 

provided that 6 is not equal to zero or 1. This is equivalent to the equation 

(1 一 

whose solution for 8 is ^ xjn. That x t /n actxiaJJy maximizes L(6) and 

i I 

In L(6) can be easily checked, even in the cases in which all of jc |3 jc 2 , . “ ， jc w 
















262 


Introduction to Statistical Inference |Ch. 6 


-■ n 

equal zero together or 1 together That is, ^ xjn is the value of 8 that 

' ,i 

maximizes L(B), The corresponding statistic. 



is called the maximum likelihood estimator of 6. The observed value of 6\ 

It 

namely x f /n, is called the maximum likelihood estimate of 0, For a simple 

I 

example, suppose that« = 3, and jt, = \^x 2 — — I, then L(6) — 8 2 ( I — 0) 

and the observed 0 | is the maximum likelihood estimate of 9, 

— 

The principle of the method of maximum likelihood can now be 
formulated easily. Consider a random sample X ls …， from a 
distribution having p.df. f{x\ 0), 0 eQ. The joint pxLf. of 
X u Xj^ …， ^ is f(Xi ； 0)f(x 2 ； 6 ) 4 * *f(x n ; 6). This joint p.dJl may be 
regarded as a function of 0. When so regardedjtis called the likelihood 
function L of the random sample, and we write 

L(0; x { , x 2l …, 0)/(x 2 ; <9) … f(x n ; 0), 0eQ. 

Suppose that we can find a nontrivial function of x 2 ” " ， : k 行 、 say 
u(x } ， x 2 , ■… ， x n )^ such that, when 0 is replaced by u(x x , .., x„X the 

likelihood function L is maximized. That is, L[u(x^ … ， a); 
文】， jv 2 , ， … ， x n ] is at least as great as L(0; x 2 ,.j^) for every 0 e Q. 
Then the statistic u(X { ^ X 2 , - * ., X n ) will be called a maximum likelihood 
estimator (hereafter abbreviated m.Le.) of 6 and will be denoted by the 
symbol 6 = u(X { , X 2 r .... X n ). We remark that in many instances there 
will be a unique mJ.e. Bof a parameter 0, and often it may be obtained 
by the process of differentiation. 


Example 2. Let X 2 . X n be a random sample from the normal 

distribution N(0, l), —oo < 6 < oo. Here 


x t , x 2 ,.. 



exp 


^ - oy 

^ ~2~ 


一 


This function L can be maximized by setting the first derivative of L, with 
respect to 6, equal to zero and solving the resulting equation for B. We note, 
however, that each of the functions L and In L is maximized at the same value 
of 9 t So it may be easier lo solve 

d\n L(B; x i ,x 1 . x„) 

- - -- — — = 0, 
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For this example, 

、 d In x j2 ? * ■ _ ’ A) 

If this derivative is equated to zero, the solution for the parameter $ is 

n n 

w(jCi, X 2 , - - - j -^n) = X 义扣 ， That ^ Xjjtt actually maximizes L is easily shown. 
Thus the statistic 

A 1 n — 

§ = u(X t ,X 2i ., X n ) ^ - V X f = X 

… n I ; 

• i' * a 

is the unique mJ.e. of the mean 6: 

It is interesting to note that in both Examples 1 and 2, it is true that 
E{§) = 8. That is, in each of these cases，the expected value of the 
estimator is equal to the corresponding parameter, which leads to the 
following definition. 

Definition 1. Any statistic whose mathematical expectation is equal 
to a parameter $ is called an unbiased estimator of the parameter 6. 
Otherwise, the statistic is said to be biased 

Example J. Let ' * * 

- a 1 *^ 

a * » 

/(x ； 沒 ） = I ， 0 < x < ff ， 0 < 0 < 00 ， 

1 ■ 

— 0 elsewhere, 

and let X u X 2 ^ … ， X n denote a random sample from this distribution. Note 
that we have taken 0 < x <8 instead of 0 < x < 0 so as to avoid a discussion 
of supremum versus maximum. Here 

1 

L(0; x^x 2 , …， = —, O<x f <0, 

which is an ever-decreasing function of 0. The maximum of such functions 
cannot be found by differentiation but by selecting 6 as small as possible. Now 

0 > each x t -; in particular, then, 6 > max (x f y Thus L can be made no larger 
than 


[max 

and the unique m 丄 e. ^ of 0 in this example is the nth order statistic max (XX 
It can be shown that £[max (X t )] = n9j{n + I). Thus, in this instance, the 
m.Le- of the parameter 0 is biased. That is, the property of unbiasedness is not 
in general a property of a m 丄 e. 
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While the m 丄 e- 沒 of 0 in Example 3 is a biased estimator, results 
in Chapter 5 show that the nth order statistic § = max (X f ) = Y n 
converges in probability to 6\ Thus, in accordance with the following 
definition, we say that $ = Y n is a consistent estimator of 0. 

Definition 2. Any statistic that converges in probability to a 
parameter 6 is called a consistent estimator of that parameter 0. 

Consistency is a desirable property of an estimator; and, in all cases 
of practical interest, maximum likelihood estimators are consistent. 

The preceding definitions and properties are easily generalized. Let 
Jf ， F， …， Z denote random variables that may or may not 
be independent and that may or may not be identically distributed. Let 
the joint p,d.f. g(x ， y ， … ， d0 2 , …， Wu 6 2 ,… ， G M )e£l ， 
depend on m parameters. This joint p.d.f” when regarded as a 
function of (0 ，， 0 2 , ■ • • ， is called the likelihood function 
of the random variables. Then those functions w t (x, y ， … ， z), 
u 2 (x^ j ， ， z), •… ， u m (x^ j ， . … ， z) that maximize this likelihood 
function with respect to 0 U d 2i . *., 6 m , respectively, define the 
maximum likelihood estimators 


沒 I = y ，• * • ， Z )， ^2 ~ 以2(尤， F ，. ， • ， 2)，* ， ■ ， 

… = ^(A", y ， … ， Z) 

of the m parameters. 

Example 4. Let U: ， … ， X n denote a random sample from a 
distribution that is N(9 { , 6 2 ), —oo<0 { < oo f 0 <B 2 < oo. We shall find ^ and 
沒 2 , the maximum likelihood estimators of 9 { and 0 2 * The logarithm of the 
likelihood function may be written in the form 


In L{9 U 0 2 \x { . 〜）= 一 

We observe that we may maximize by 


tt 

~ 6if n In (2^ 2 ) 


20 2 2 
differentiation. We have 



* 




n 


dlnL 


H (x, — 


n 


de t e 2 ’ d0 2 20\ 20 / 

If we equate these partial derivatives to zero and solve simultaneously the two 

. n 

equations thus obtained, the solutions for $ and 0 2 are found to be f x^n = x 

« _ . i 

and Y, ( x i ~ 3c) 2 /^ = s 2 , respectively. It can be verified that these 
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solutions maximize L. Thus the maximum likelihood estimators of ^ 
^nd Qj — a 1 are, respectively, the mean and the variance of the sample, namely 

^ ^ and § 2 = S\ Whereas ^ is an unbiased estimator of , the estimator 

^2 ~ S 2 is biased because 


= j £ 0 ) 


ff2 E ( nS1 


(n — \)a 2 (« — ])9 2 


n 




n 


n 


However, in Chapter 5 it has been shown that and = S 2 converge 

in probability to& and 6^ respectively, aod thus they are consistent estimators 
of 0 t and 0 2 . 


Suppose that we wish to estimate a function of say h{6). For 
convenience, let us say that tj = h{6) defines a one-to-one 
transformation. Then the value of q，say 斤， that maximizes the 
likelihood function L(6) y or equivalently L[0 — ^^ r (//)] T is selected so 

that §= /r 1 ⑹， where 9 is the m.l.e, of 0, Thus rj is taken so that 
^ — h0); that is, 

. h(ff) = h0l 


This result is called the invariance property of a maximum likelihood 
estimator. For illustration ，if rf = 0\ where 0 is the mean ofN($, I), then 
疗 =JT 气 While there is a little complication if h{0) is not one-to-one, we 
still use the fact that ^ = h{§)^ Thus if X is the mean of the sample from 
6(1， ❸， so that 0 ^ Z and if ^ = $(} - 0) f then rj = X(l ^ X). These 
ideas can be extended to more than one parameter. For illustration，in 
Example 4, if ^ + 2^0~ 2f then fj = X + 2S. 

Sometimes it is impossible to find maximum likelihood estimators 
in a convenient closed form and numerical methods must be used to 
maximize the likelihood function. For illustration, suppose that 
X u X 2> .,. f X n is ^ random sample from a gamma distribution with 
parameters a - and j? = 0 2 , where 0 X >0 f 0 2 > 0. It is difficult to 
maximize 

4 


L(0 U 0 2 ； x u 





(X,X2 



with respect to 0 { and 0 2 , owing to the presence of the gamma function 
[{0j). Thus numerical methods must be used to maximize L once 
x u x 2y x n are observed. 

There are other ways, however, to obtain easily point estimates of 
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and 0 2 . For illustration, in the gamma distribution situation, let us 
simply equate the first two moments of the distribution to the 
corresponding moments of the sample. This seems like a reasonable 
way in which to find estimators, since the empirical distribution F n (x) 
converges in probability to F(x), and hence corresponding moments 
should be about equal. Here in this illustration we have 

H = 文， 0i6i = S\ 
the solutions of which are 

= ^ and = p 

We say that these latter two statistics, and 氏 ， are respective 
estimators of and 0 2 found by the method of moments 、 

To generalize the discussion of the preceding .paragraph, let 
f i ， ^ 2 ,… ，尤 be a random sample of size n from a distribution with 
p,dX f(x; Qi , $ 2 ,..,, 0 r ), {9 U …， W Q. The expectation E(X k ) is 
frequently called the Mh moment of the distribution, k — 1 ， 2 , 3, ■ ♦., 

n 

The sum Af A = ^ Xf/w is the /rth moment of the sample ， 

i 

k = 1 ， 2, 3, ，，• • The method of moments can be described as follows. 

irn- 

Equate E{X k ) to M k , beginning with k = 1 and continuing until there 
are enough equations to provide unique solutions for … ， G” 
say hf(Mi , Af 2 , …、， i = 1 ， 2,… ， #■ ， respectively. It should be noted 
that this could be done in an equivalent manner by equating fi = 五 (JSQ 

to X and E\{X — ^) k ] to ^ (Z ； — Xfjn^ k = 2, 3, and so on until unique 

solutions for 0 2 , … are obtained. This alternative procedure 
was used in the preceding illustration. In most practical cases, the 
estimator = hi(M t ，，…） of 6 h found by the method of moments, 
is a consistent estimator of 0 t , i = 1 ， 2 , …， r, 

EXERCISES 

* » 

Let Jfj, X 2 , .. represent a random sample from each of the 
distributions having the following probability density functions: 

(a) f(x; 9) = 6 x e~ e jxU x - 0, I f 2,,.., 0 < 0 < oo, zero elsewhere，where 
/(0 ； 0)=L 

(b) f(x ； e)^e^-yo jc 1,0 < 0 < oo ， zero 

⑹ f(x; 6) = 0 < x < co,0 <0 <oo, zero elsewhere- 
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(d) f(x; 6) = 轸 - 1 卜 ' — oo < x < oo, — co < 9 < oo, 

(e) f(x; d) = 卜奶 ， 8 < x < oo^ — oo < 0 < oo, zero elsewhere. 

In each case find the m.Le. 6 of 0. 

•霄 ， 

6.2. Let X U X 2 , be i 丄 d，each with the distribution having p,dX 

f(x; 9 l , 0z) = (l/9 2 )e~ ix ~ 9im , 0i<x<oo, - qo < 0, < oo, 0 <& 2 < oo, 
zero elsewhere. Find the maximum likelihood estimators of 9 X and 

6.3. L^tYi < Y 2 < * 1 * < h be the order statistics of a random sample from 

a distribution with pAS. f(x; 0) — — + —oo < 0 < oo, 

zero elsewhere. Show that every statistic u(X u X 2 ^ .. - ， X n ) such that 

Y n — j < u(X { 9 < 1 ^ 1+5 

is a mJ.e. of 0. In particular, (4Yi + 2Y„ + 1)/6, (Vi + 0/2, and (2 Y { + 
4Y n — 1)/6 are three such statistics. Thus uniqueness is not in general a 
property of a mXe. 

6.4. Let and X l have the multinomial distribution in which n — 25, 

k — 4, and the unknown probabilities are 0 r , 0 2) and respectively. 
Here we can s for convenience, let X 4 — 25 — X { — X 2 — and 
8 a — 1 — 6[ — 9 2 — 03 , If the observed values of the random variables are 
Xi = 4, jc 2 = 1 1， and x 3 = 7, find the maximum likelihood estimates of 0,, 
0 2 , and 0 3 , 

6*5 - The Pareto distribution is frequently used as a model in study of incomes 
and has the distribution function 

f{x; dy, 0 2 ) = 1 一 (Alxf 1 ， 9^ < x, zero elsewhere, 

where 0 】 > 0 and 0 2 > 0 . 

If X u X 2 , ^ ^ X n is a random sample from this distribution, find the 
maximum likelihood estimators of 仏 and 0 2 . 

6 . 6 . Let Y n be a statistic such that lim E(Y n ) = 9 aad lim a\ n = 0. Prove that 

W-+0O 

Y„ is a consistent estimator of 8. 

Hint: Pr (I Y n -d\>e)< E[(Y n - 嚇 2 and 肌一納 =[ 肌一的 】 2 
+ Why? 

6.7. For each of the distributions in Exercise 6,1， find an estimator of 8 by 
the method of moments and show that it is consistent. 

6 . 8 . If a random sample of size n is taken from a distribution having p*dX 
f(x; 9) = 2x/6 2 , 0 <x zero elsewhere, find: 

⑻ The m.l,e. § for 0, 

(b) The constant c so that E(c^) = 0- 

(c) The mJ,e* for the median of the distribution. 
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6*9* Let X 2> ,.. t X n be i 丄 d.，each with a distribution with p,d.f. 
f(x\ 8) — 0 < x < oo, zero elsewhere. Find them.Le. ofPt(X < 2) 

6.10, Let X have a binomial distribution with parameters n and p. The 

variance of X/n is /?{! — p)/tt\ this is sometimes estimated by the m I e 
X (. x\ I 

this an unbiased estimator of /?(I - p)jrn If not, can you 


I 


n 


n 


construct one by multiplying this one by a constant? 


6. H 畢 Let the table 


I [ 0 1 2 3 4 5 

Freq uency I 6 10 — 14 13 6^ f 

represent a summary of a sample of size 50 from a binomial distribution 
having n — 5, Find the m.l.e. of Pr (X > 3). 

6*12. Let V t < Y 2 < … < K be the order statistics of a random sample of 
size w from the uniform distribution of the continuous type over the closed 

interval [8 — 0 + p]. Find the maximum likelihood estimators for 8 and 

P* Are these two unbiased estimators? 

6.13* Let A" t , X 2 , X 4y X s be a random sample from a Cauchy distribution 
f with median B % that is, with p.df. 

/U ; 沒 ） = il +(j —的 2 , - oo < ^ < 00 , 

where -oo <0< co. If - \ .94, x 2 = 0.59, = - 5.98, 

A = — 0.08, x 5 = - 0.77, find by numerical methods the m.l,e, of 0. 


6*2 Confidence Intervals for Means 




Suppose that we are willing to accept as a fact that the (numerical) 
outcome X of a random experiment is a random variable that has a 
normal distribution with known variance a 2 but unknown mean ", 卜 
That is ，十 is some constant, but its value is unknown. To dick some ^ dt 
information about ", we decide to repeat the random experiment ? 
under conditions n independent times, n being a fixed 

positive ji iA^ Let the random variables 尤，1 2 , …乂 denote, 
respectively, the outcomes to be obtained on these n repetitions of the 
experiment. If our assumptions are fulfilled, we then have under 
consideration a random sample 為， ■ •. ，夂 from a distribution 
that is a 2 ), a 2 known. Consider the maximum likelihood estima- 
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tor ofnamely /i = X Of course, X is NQi ， a 2 jn) and (X — ^/(a/^/n) 
is N(0^ 1). Thus 




= 0.954. 


However, the events 


and 


2 <^< 2 , 




— 2a 




<c )C — \i <r 


4~n 


X 


^ <VL< X^ la 


are equivalent* Thus these events have the same probability. That is, 


Pr ( ^ ~ < /x< X + 





0.954 


Since—<r is a known number, each of the random variables X — laj^fn 
and X + laj^/n is a statistic. The interval (X — 2a/^/n 9 X + Za/^/n) 
is a random interval In this case, both end points of the interval are 
statistics. The immediately preceding probability statement can be 
read: Prior to the repeated independent performances of the random 
experiment, the probability is 0.954 that the random interval 

(X — 2<r/y/n, X + 2a/^/n) includes the unknown fixed point (par¬ 
ameter) fi. 甚 n '. 

Up to this point ， only probability has been involved; 
the determination of the p.d.f of X and the determination of 
the random interval were problems of probability. Now the 
problem becomes statistical. Suppose the experiment yields 
X l = X|，= x 2 , …， = x n . Then the sample value of X is 
x = (:v ■十 jv 2 + … ■ + x u )jn^ a known number. Moreover, since a 
is known，the interval (3c — Icj^Jn^ x 4 - 2ai^/n) has known 
endpoints. Obviously, we cannot say that 0.954 is the probability that 
the particular interval (3c — 2<j/y/n, x + 2a/y/n) includes the 
parameter ^ for ^ although unknown, is some constant, and this 
particular interval either does or does not include ^ However, the 
fact that we had such a high probability，prior to the performance of 
the experiment, that the random interval (X — X + 2aj^/n) 

includes the fixed point (parameter) fi, leads us to have some 
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reliance on the particular interval (3c — x + laj^/n). This 

reliance is reflected by calling the known interval ( 3 c — 2uLfn, 
文 + 2(7/^/ 万 ) a 954 percent confidence interval for fi. The number 0.954 
is called the confidence coefficient. The confidence coefficient is equal 
to the probability that the random interval includes the parameter. One 
may，of course, obtain an 80, a 90, or a 99 percent confidence interval 
for fi by using 1.282, 1.645，or 2,576 ， respectively, instead of the 
constant 2 . 

A statistical inference of this sort is an example of interval 
estimation of a parameter ； Note that the interval estimate of fi is found 
by taking a good (here maximum likelihood) estimate 3 c of// and adding 
and subtracting twice the standard deviation of X, namely 
2(r/^/n, which is small if n is large. If a were not known, the end points 
of the random interval would not be statistics. Although the prob¬ 
ability statement about the random interval remains valid, the sample 
data would not yield an interval with known end points. 

Example L If in the preceding discussion n = 40, a 2 = 10, and 3c = 7,164, 
then (7,164 - 1.282^^,7.164+ 1.282^/^)，or (6,523, 7.805)，is an 80 percent 
confidence interval for /x. Thus we have an interval estimate of fi. 

In the next example we show how the central limit theorem may 
be used to help us find an approximate confidence interval for \i when 
our sample arises from a distribution that is not normal 

Example 2 . Let X denote the mean of a random sample of size 25 from 
a distribution having variance cr 2 = 100， and mean fi. Since a/^/n = 2, 
then approximately 

= 0.95, 
or 

Pr (X- 3m <ti<X + 3.92) - 0.95- 

Let the observed mean of the sample be x = 67.53 、 Accordingly, the interval 
from x — 3,92 - 63*61 to x + 3.92 = HAS is an approximate 95 percent 
confidence interval for the mean /i. 

Let us now turn to the problem of finding a confidence interval for 
the mean /i of a normal distribution when we are not so fortunate as 
to know the variance a 2 . From Section 4*8, we know that 

T _ Vkl z ^ 

-- ' sj n5 ? /[ ff2 ( ra - i)i - Sfy/n — 1 
has a 卜 distribution with n — 1 degrees of freedom，whatever the value 


Pr -L96< 


X 




2 


< L96 
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of a 2 > 0. For a given positive integer n and a probability of 0.95, say, 
we can find a number b from Table IV in Appendix B, such that 



which can be written in the form 


Pr ^ 


bS 




< < X + 


bS 




0 , 95 , 


Then the interval [X — (bS/y/n - 1)， X + (bS/^/n — I)] is 过 random 
interval having probability 0,95 of including the unknown fixed point 
(parameter) ft. If the experimental values of X u X 2 , are 

x u x 2 , * ^ y x n with = 2 ^(Xi — xfjn, where x = Y xjn, then the 

_ J t 

interval [x ― (bsj^Jn — l) s x + (bs/^/n — 1)] is a 95 percent confidence 
interval for ft for every <r 2 > 0, Again this interval estimate of# is found 

by adding and subtracting a quantity, here bsj^Jn — 1, to the point 
estimate x. 

Example 3. If in the preceding discussion n = I0 y x - 3.22, and 5 - M7, 
then _the interval [3.22^(2.262)(1.17)/^ 122 + (2.262)(1.17)/^] or 
(2.34, 4.10) is a 95 percent confidence interval for fi. 

Remark ， If one wishes to find a confidence interval for ft and if the 
variance a 2 of the nonnormal distribution is unknown (unlike Example 2 of 
this section), he may with large samples proceed as follows. If certain weak 
conditions are satisfied, then ^, the variance of a random sample of size n >2^ 
converges in probability to al Then in 

- - u) 

穿 l{n_ l)cr 2 $ . 

the numerator of the left-hand member has a limiting distribution that is 
JV(G ， 1) and_the_denominator of that member converges in probability to 1. 
Thus \ (X - fi)/S has a limiting distribution that is 7V(0,1). This fact 
enables us to find approximate confidence intervals for fi when our conditions 
are satisfied* This procedure works particularly well when the underlying 
nonnormal distribution is symmetric, because then X and arc uncorrelated 
(the proof of which is beyond the level of the text). As the underlying 
distribution becomes more skewed, however, the sample size must be larger 
to achieve good approximations to the desired probabilities, A similar 
procedure can be followed in the next section when seeking confidence 
intervals for the difference of the means of two nonnormat distributions. 
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We shall now consider the problem of determining a confidence 
interval for the unknown parameter p of a binomial distribution when 
the parameter n is known. Let F be b(n y p\ where 0 < p < \ and n is 
known. Then p is the mean of Yjn. We shall use a result of Example 
1， Section 5,5, to find an approximate 95.4 percent confidence interval 
for the mean p. There we found that 


Pr 


2 < 


Y 


n P 


Jn(Yin){\ - Y/n) 


< 


0,954, 


approximately. Since 


Y — np 


(m - p 


^/n{Yin){\ - Y/n) ^/(Y/n)(\ ^ Yin)/ri 
the probability statement above can easily be written in the form 


Pr 


Z - 2 

n 


mm 


n 


^<,<I +2 

n 



(1>)(1 - Yin) 


n 




0.954, 


approximately* Thus, for large n, if the experimental value of Y is y, 
the interval 


y _ 2 (yM l 一 yM y 2 (yM)(^ -yM 

n V n , n l \ n 


provides an approximate 95.4 percent confidence interval for p. 

A more complicated approximate 95.4 percent confidence interval 
can be obtained from the fact that Z = (F — np)jyjnp(\ — p) has a 
limiting distribution that is iV(0, 1)，and the fact that the event 
— 2 < Z < 2 is equivalent to the event 

r + 2 - 2j[Y{n - 刚 + 1 Y + 2^l J[Y{n- Y)/n] + 1 

~ n + 4 — <P< - w + 4 • 

⑴ 

fhe first of these facts was established in Chapter 5, and the proof of 
inequalities (1) is left as an exercise. Thus an experimental value y of 
r may be used in inequalities (I) to determine an approximate 95,4 
percent confidence Interval for p. 

If one wishes a 95 percent confidence interval for p that does not 
depend upon limiting distribution theory, he or she may use the 
following approach. (This approach is quite general and can be used 
in other instances; see Exercise 6*21.) Determine two increasing 
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functions of/?, say c, (p) and c 2 (p), such that for each value of/? we have 
at least approximately, 


p r [c t (p) <Y< c 2 (p)] = 0,95. 


The reason that this may be approximate is due to the fact that Khas 
a distribution of the discrete type and thus it is, in general，impossible 
jo achieve the probability 0^95 exactly. With c,(p) and c 2 (^) inL^g 
functions, they have single-valued inverses, say d^y) and d 2 (y) 
respectively. Thus the events c】（/?) <Y < c 2 (p) and d 2 (Y) < p < d^Y) 
are equivalent and we have, at least approximately, 

PTid 2 (Y)<p<d l (Y)]^ 035, 

In the case of the binomial distribution, the functions c 2 (p\ d 2 {y\ 

and d x {y) cannot be found explicitly, but a number of books provide 
tables of d 2 (y) and d { (y) for various values of n. 

Example 4 If, in the preceding discussion, we take n = 100 and j = 20 ? the 
first approximate 95.4 percent confidence interval is given ? bv 
(0.2 2 v /(0.2)(0,8)/I00, 0.2 + 2 % /(0.2)(0.8)/I00) or ((M2, 0.28). The ap¬ 

proximate 95*4 percent confidence interval provided by inequalities (1) is 

「22 - 2^/(1600/100) + 1 22 + 2^/(1600/100) + l\ 

\ 104" 104 -) 

or (0. 13, 0.29). By referring to the appropriate tables found elsewhere, we find 
that an approximate 95 percent confidence interval has the limits d 2 (20) = 0,13 

and 碱 (20) = 0.29. Thus, in this example, we see that all three methods yield 
results that are in substantial agreement. 

Remark. The fact that the variance of Yjn is a function of p caused us 
some difficulty in finding a confidence interval for p. Another way of handling 
the problem is to try to find a function u(Y/n) of Y/n y whose variance is 
essentially free of p. In Section 5.4, we proved that 



has an approximate normal distribution with mean arcsin Jp and variance 

l/4«. Hence we could find an approximate 95.4 percent confidence interval bv 
using 3 


Pr 


2 < 


arcsin ^/Yjn — arcsin JP 


< 



0,954 


and solving the inequalities for /?, 

Example 5. Suppose that we sample from a distribution with unknown 
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mean /x and variance o 1 — 225, We want to find the sample size n so that 又 ± 1 
(which means x — 1 to 3c + 1) serves as a 95 percent confidence interval 
for /i. Using the fact that the sample mean of the observations, X, is 
approximately NQi ， a 2 jn\ we see that the interval given by 叉土 1,96(15/^^) 

will serve as an approximate 95 percent confidence interval for fl. That is，we 
want 

or, equivalently, 

y/n — 29.4, and thus n ^ 86436 

or n — 865 because n must be an integer. Suppose, however, we could not 
afford to take 865 observations. In that case, the accuracy or confidence level 
could possibly be relaxed some. For illustration, rather than requiring i 士 1 
to be a 95 percent confidence interval for fi f possibly J 土 2 would be a 
satisfactory 80 percent one. If this modification is acceptable, we now have 

-2 

or, equivalently, 

y/n = 9.615 and n ^ 92.4. 

* ¥ 卜 

Since « must be an integer, we would probably use 93 in practice. Most likely, 

the persons involved in this project would find this is a more reasonable sample 
size. 

EXERCISES 

6.14* Let the observed value of the mean Xofa random sample of size 20 from 
a distribution that is jV(/i ， 80) be 81 _2. Find a 95 percent confidence interval 
for 烬 

6.15* Let Z be the mean of a random sample ofsize n from a distribution that 
is N(fi, 9) t Find 打 such that Pr(X— I < /£ < JT 1) = 0-90, approximately. 

6_ 16 ， Let a random sample of size 17 from the normal distribution N(fi, a 1 ) 
yield x — 4.7 and — 5,76, Determine a 90 percent confidence interval for 

烬 : 

6J7* Let X denote the mean of a random sample of sizew from a distribution 
that has mean fi and variance a 2 = 10« Find n so thatjhe probability is 
approximately 0,954 that the random interval (X — X + ^) includes pt. 
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6*18 ， X 2$ ^ be a random sample of size 9 from a distribution that 

is 鄭， a 2 ). . . 

(a) If <x is known, find the length of a 95 percent confidence interval for ju 
if this interval is based on the random variable ^/9(X - fi)/<r. 

(b) If tr is unknown, find the expected value of the length of a 95 percent 
confidence interval for fi if this interval is based on the random variable 

離 

Hint: Write E(S) = ((r/y/n)E[(nS 2 /G 2 y i2 l 

(c) Compare these two answers. 

6,19, h^tX\ 9 X 2$ , -,, X n , + ( be a random sample of size n + X^n > I ^ from 

a distribution that isN(fi y u 1 ). LetX^j]X i /nandS 2 = f (JT, - Xfjn. Find 

J * 

the constant c so that the statistic c{X - X^^jS has a /-distribution. If 
n = 8, determine k such 上 hat Pt (X ~ kS < X 9 < X + kS) = 0*80. The 

observed interval — ks^ x 十 h) is often called an 80 percent prediction 
interval for X 9 . 

* * * B 11 

6*20. Let Y be 6(300,/?), If the observed value of Y is y = 75, find an 

approximate 90 percent confidence interval for p, 

6^21. Let X be the mean ot a random sample of size n from a distribution that 
is N(ji, (J 2 ), where the positive variance a 2 is known. Use the fact that 
Q>(2) — ^>(—2) = 0,954 to find, for each ^ c,(/i) aed c 2 (ji) such that 
Pr[qOi) <X< c 2 (ji)} = 0.954. Note that c } (fx) and c 2 (/x) are increasing 
functions of fi. Solve for the respectwe functions d { (x) and d 2 (x); thus we 

also have that Pr [^ 2 (^) < ^ < ^i(JT)] — 0.954. Compare this with the 
answer obtained previously in the text. 

6*22. In the notation of the discussion of the confidence interval for /?, show 
that the event — 2 < Z < 2 is equivalent to inequalities (1)* 

Hint; First observe that — 2 < Z < 2 is equivalent to Z 1 < 4, which can 
be written as an inequality involving a quadratic expression in p. 

r … i i 

6.23* Let X denote the mean of a random sample of size 25 from a 
gamma-type distribution with a = 4 and 芦 > 0. Use the central limit 
theorem to find an approximate 0.954 confidence interval for ^ the mean 
of the gamma distribution. 

一 Hint: Base the confidence interval on the random variable 
(JP — 邻 )/(4 於 2 /25 严 = 5Xm - 10- 

6*24. Let x be the observed mean of a random sample of size n from a 
distribution having mean # and known variance a 2 . Find n so that 3c - cr/4 
to ^ + a/4 is an approximate 95 percent confidence interval for ^ 

6*25. Assume a binomial model for a certain random variable. If we desire 
a 90 percent confidence interval (or p that is at most 0.02 in length, find 反 
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Hint: Note that - yjn) < - 

6,26. It is known that a random variable Xhas a Poisson distribution with 
parameter/i. A sample of200 observations from this population has a mean 
equal to 3 A Construct an approximate 90 percent confidence interval 
for ft. 

6-27* Let K < y 2 < … < Y n denote the order statistics of a random sample 
of size n from a distribution that has pAS. f(x) = 3X 2 / 沒 3 , 0<x<9, zero 
elsewhere. 

(a) Show that Pr (c < YJ0 < 1) = 1 — c 3 ' where 0 < c < L 

(b) If ^ is 4 and if the observed value of Y 4 is 2,3, what is a 95 percent 
confidence interval for 61 

6.28. Let X u X 2 ,... f X„ be a random sample from N(ji, a 2 ), where both 
parameters fi and a 2 are unknown. A confidence interval for a 2 can be found 
as follows. We know that nS 2 /tj 2 is ^ 2 (n — I). Thus we can find constants 
d and b so that Pr (nS^a 2 < b) — 0.975 and Pr (a < nS 2 /^ <b) = 0.95. 

(a) Show that this second probability statement can be written as 
Pr (nS 2 /b < a 2 < nS 2 ja) = 0.95. 

(b) If n — 9 and s 1 = 7.63, find a 95 percent confidence interval for a 1 , 

(c) If ^ is known，how would you modify the preceding procedure for 
finding a confidence interval for cr 2 ? 

6.29. Let Xi , Z 2 ,., •, 尤 be a random sample from a gamma distribution with 

known parameter a = 3 and unknown 0 > 0, Discuss the construction of 
a confidence interval for 存， n 

Hint: What is the distribution of 2 [ XJP. Follow the procedure 
outlined in Exercise 6,28. ^ 1 


63 Confidence Intervals for Differences of Means 

The random variable T may also be used to obtain a confidence 
interval for the difference fi x — ft 2 between the means of two normal 
distributions, say NQi' ， a 2 ) and N(p 2 , when the distributions have 
the same, but unknown^ variance a 1 . 

Remark* Let X have a normal distribution with unknown parameters 川 
and a 2 . A modification can be made in conducting the experiment so that the 
variance of the distribution will remain the same but the mean of the 
distribution will be changed; say, increased. After the modification has been 
effected, let the random variable be denoted by K, and let Y have a normal 
distribution with unknown parameters fi 2 and a 1 . Naturally，it is hoped that 
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is greater than fi u that is, that ju^ < 0, Accordingly, one seeks a 
confidence interval for — \i 2 in order to make a statistical inference* 


A confidence interval for fx x — fi 2 may be obtained as follows: Let 

不， JT 2 ” . •，足 and Y u 7 2 ,.,. y Y m denote, respectively, independent 
random samples from the two distributions, and N(ji 2 , a 2 X 

respectively. Denote the means of the samples by X and F and the 
variances of the samples by S 2 } and Sj, respectively, It should be noted 
that these four statistics are independent. The independence of JT and 
(and，inferentially that of Y and Sj) was established in Section 4.8; 
the assumption that the two samples are independent accounts for the 
independence of the others. Thus X and Y are normally and 
independently distributed with means 妁 and fx 2 and variances u 2 jn and 
<j 2 /m,jespectively. In accordance with Section 47, their difference 
Z F is normally distributed with mean fi } — and variance 
o" 2 //! + (x 2 /m_ Then the random variable 


(f — F) —Qi 广 

yf cr^in H- u 2 jm 

is normally distributed with zero mean and unit variance. This random 
variable may serve as the numerator of a T random variable. Further, 
^/a 2 and mS\ja 2 have independent chi-square distributions with 
« — 1 and m — 1 degrees of freedom, respectively, so that their sum 
+ fnSl)/a 2 has a chi-square distribution with n + m~2 degrees 
of freedom, provided thatm + - 2 > 0, Because of the independence 

of Jf ， 努 ， and 旬 ， it is seen that 

nSj + mS\ 


a 2 {n + m — 2) 


may serve as the denominator of a T random variable. That is, the 
random variable 

= (无-巧 - fi 2 ) • 

fnS 2 { + mSj 
J n + m — 2 

has a /-distribution with n + m-2 degrees of freedom. As in the 
previous section, we can (once n and m are specified positive integers 
with n + m ~2> Q) find a positive number b from Table IV of 
Appendix B such that 

" r 

Pr ( — ^ < 7 - < 6) = 0,95, 
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If we set 



nS^ + mS\ 

n + m — 2\n^ m 


this probability may be written in the form 

Pr[(f- f)-bR< i i,~ix 1 <{X - Y) + bR]^ 0.95, 
It follows that the random interval 


(x- f) — b nS2{ + mS l [ - + - 


2\n 


㈣ 


+ 


m 


has probability 0.95 of including the unknown fixed point - ft 2 ). As 
usual, the experimental values of X, F, S 2 } , and S 2 2 , namely 3c, y, *y^and 
s h provide a 95 percent confidence interval for fi } — fi 2 when the 
variances of the two normal distributions are unknown but equal, A 
consideration of the difficulty encountered when the unknown 

variances of the two normal distributions are not equal is assigned to 
one of the exercises. 


Example h It may be verified that if in the preceding discussion n = 10, 
w = 7, x = 4.2, 孓 = 3.4, = 49, 4 = 32, then the interval ( —5J6, 6,76) is a 

90 percent confidence interval for ^ — fi 2 . 

Let Y t and Y 2 be two independent random variables with binomial 

distributions b{n u p x ) and b(n 2 ' p 2 ) ， respectively. Let us now turn to the 

problem of finding a confidence interval for the difference p { — p 2 of 

the means of Y l /n l and Y 2 /n 2 when n x and n 2 are known. Since the mean 

and the variance of Kj/zi, — YJn: are ， respectively, p 广 p 2 and 

PiO — Pi)! 打' + PiO — Pi)/^ then the random variable given by the 
ratio. 

{Yijn\ — Y 2 /n 2 ) — _ P 2 ) 

Jp x {\ -p x )\n x +p 2 (l -p 2 )/n 2 

has mean zero and variance 1 for all positive integers n x and n 2 * More¬ 
over, since both Y\ and Y 2 have approximate normal distributions 
for large n t and /i 2 » one suspects that the ratio has an approximate 
normal distribution* This is actually the case，but it will not be 
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proved here. Moreover, ifnjnn where ^isa fixed positive constant, 
the result of Exercise 636 shows that the random variable 


(TJn t )(l - + (Y 2 jn 2 )(l - Y 2 /n 2 )/n 2 

piG ~P\)l n \ + P2O — 


⑴ 


converges in probability to I as /i 2 ^oo (and thus n^oo, since 

n A /n 2 — c,c> 0). In accordance with Theorem 6， Section 5.5, the 
random variable 


W 


(Kj/zj, — Yi/nj) — ip\ — p 2 ) 


where 


U = a/(^Mi)(I YJn^/ni + {Y 2 ln 2 )(\ — ^2/^2)/^ 

« 

has a limiting distribution that is N(Q ， V), The event — 2 < W <2, the 
probability of which is approximately equal to 0.954, is equivalent to 
the event 


Zi 


Y 2 

打 2 


2U < p } 


p < Xi^h + 2u. 

fh n 2 


Accordingly, the experimental values and y 2 of Y l and F 2 , 

respectively, will provide an approximate 954 percent confidence 
interval for p' _ p 2 . 

Example Z If, in the preceding discussion, we take n, = 100, n 2 = 400, 
y } — 30, y 2 = 80, then the experimental values of YJn' — y 2 /n 2 and C/areOJ 

and ^/(0*3)(0*7)/100 + (0.2)(0,8)/400 — 0,05, respectively* Thus the interval 
(0,0.2) is an approximate 95.4 percent confidence interval for p t - p 2 , 

* >1 

EXERCISES 


6.30. Let two independent random samples, each of size 10, from two normal 
distributions N(fi u a 2 ) and N(ft 2 , a 2 ) yield x = 4,8, g = 8.64, J = 5*6, 

— 7.88. Find a 95 percent confidence interval for ^ — 

6_31_ Let two independent random variables and Y 2j with binomial 
distributions that have parameters n l = n 2 = 100 ? p u and p 2 , respectively, 
be observed to be equal to y } — 50 and y 2 — 40. Determine an approximate 
90 percent confidence interval for p t — p 2 ‘ 

6.32, Discuss the problem of finding a confidence interval for the difference 

— fi 2 between the two means of two normal distributions if the variances 
a] and a\ are known but not necessarily equal, 
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6.33* Discuss Exercise 6,32 when it is assumed that the variances ate unknown 
and unequal. This is a very difficult problem, and the discussion 
shouJd point out exactly where the difficulty lies* If, however, the variances 
are unknown but their ratio a known constant/:, then a statistic that 

is a 7 1 random variable can again be used. Why? 

6,34* As an illustration of Exercise 6.33, one can let 不， JST 2 , • and 
Fj, K 2 , * ， F l2 represent two independent random samples from the 

respective normal distributions N(^^) and Niji^ a\y It is given that 

<T| = 3{j 2s but g\ is unknown. Define a random variable which has a 
/-distribution that can be used to find a 95 percent interval for — " 2 . 

6.35_ Let X and Y be the means of two independent random samples, each 
of size /I， from the respective distributions 叫川， a 2 ) and N(fi 2y a 2 ), where the 
common variance is known. Find n such that 


Pr(f- Y-ai5<ii x -fx 2 <X- Y+ aj5) = 0.90 

6.36_ Under the conditions given, show that the random variable defined by 
ratio (1) of the text converges in probability to 1, 

6,37. Let X h X 2 ^., *, X n and Y u V 2 ,. _ , be two independent random 
samples from the respective norma! distributions and N(^ a 2 2 ). 

where the four parameters are unknown. To construct a confidence interval 

{ or ilt€ ratio , of the variances, form the quotient of the two 

independent chi-square variables, each divided by its degrees of freedom, 

， namely ’ 



where S 2 ^ and Si are the respective sample variances* 

(a) What kind of distribution does F have? 


(b) From the appropriate table, a and b can be found so that 
Pr (F<b)^ 0.975 and Pr (a<F<b) = 0.95, 

(c) Rewrite the second probability statement as 


Pr 


a 


nS 2 J(n — 1) 


<4<b 


nSy(n - l ) 


mS\Km - 1) a\ mS\l(m - I) 




0,95 


The observed values, s 2 { and can be inserted in these inequalities to 
provide a 95 percent confidence interval for /g 2 2 , 


6*4 Tests of Statistical Hypotheses 

The two principal areas of statistical inference arc the areas of 
estimation of parameters and of tests of statistical hypotheses* The 
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problem of est 晒 tion of parameters, both point and interval esti¬ 
mation has been treated. In Secti ⑽ 6.4 and 6.5 some aspects 
f statistical h>yotheses and tests of statistical hypotheses will 
be considered. The subject will be introduced by way of example. 

is f LCt il , be kn T th3t the outcome Ma random experiment 

h ,nstan " e * ^ den ^ a score on a test, which score 
assun J e to be normally distributed with mean 6 and variance 100 Let 

us_say the past experience with this random experiment indicates tha 

fh= /■ UPPOS t e，0Wing P° ssiW y t0 some research in the area pertaining to 
P ? nmentj some Ganges are made in the method of performing 

but 8 tlTnlTTrh 1 " then SUSPeCte f d that ⑽丨⑽阶 h 7! 
thlt Vi 7k IS aS / Ct n ° fo 職 1 ^Perimenta! evidence 

fat f > 75 ; hence the statement $ > 75 is a conjecture or a statistical 

hypothesis. In admitting that the statistical hypothesis 沒 > 75 may be false 
=allow = effect, the possibility that 6 ^ 75, Thus there are actually^o 
^ atlStl , Cal hypotheses. First, that the unknown parameter 6 < 75^ that is 

e n< J SeCondj that the unknown parameter 

? > 75. Accordingly, the parameter space is ft ^ {0 ； < ,9 , w 

denot ^ rSt of K these hypotheses by the symbols H,;6<75 and the 

nUi 乃 ， & hy P° thesis ^ : e > 75 is called the alternative 
hypothesis. to say, H 0 could be called the alternative to /7 ■ 

tTuail ； he ^ 75 , ^ & 咖咖岵伽 reSearch w ^er 

t u 穴 he aIternatlve hypothesis. In any case the problem 
二 to deade which of these hypotheses is to be accepted. To reach a decision 

the random experiment is to be repeated a number of independent times' 
say /i, and the results observed. That is, we comider 货 f ， 

fhf 2y： u '/if fr ° m |, a distribution that is m 100), and we devTse'Huk 
hat WI n tell us what decision to make once the experimental value!, 

say ” have been determined. Such a rule is called a 

the hypothesis H,:0<75 against the alternative hypothesis H, ： 0>75 

There is no bound on the number of rules or tests that can be con 

structed. We shall consider three such tests. Our tests will be constructed 
around the partition 伽； ;: 】二 == 

subset C and its complement C*. If the experimental values ofX X y 
say % ， x 2 , …，；^， are such that the point (x^ . x C wp dii 二;二 

the hypothesis H Q (accept the hypothesis H x ), If we have x)e J c * 

we sha!1 accept the hypothesis H 0 (reject the hypothesis H { ). J , 


Test h Let n = 25, The sample space s/ is the set 

{(^i? * Xis) i — 00 < x f < 00, i = 1,2 ， 


25 } 
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Let the subset C of the sample space be 

C - {(x u x 2 , … ， x 25 }: 々 + x 2 + •. • + x 2S > (25)(75)}. 

We shall reject the hypothesis if and only if our 25 experimental values are 
such that (jc u x 2 , … ， jc 25 ) e C If (jc! ， x 2 , …， x 2 s) is not an element of C, we 
shall accept the hypothesis // 0 . This subset C of the sample space that leads 
to the rejection of the hypothesis H Q : 0 < 75 is called the critical region of Test 

25 25 

1 _ Now ^ x f > (25)(75) if and only if 3c > 75, where 3c — ^ Xi/25. Thus we can 

* i 

much more conveniently say that we shall reject the hypothesis H 0 : 6 < 75 and 
accept the hypothesis H { : 6 > 75if and only if the experimentally determined 
value of the sample mean 3c is greater than 75* If 3c < 75, we accept the 
hypothesis H 0 :9 < 75. Our test then amounts to this: We shall reject the 
hypothesis H 0 : 6 <75if the mean of the sample exceeds the maximum value 
of the mean of the distribution when the hypothesis H 0 is true. 

It would help us to evaluate a test of a statistical hypothesis if we knew 
the probability of rejecting that hypothesis (and hence of accepting the 
alternative hypothesis). In our Test 1 ， this means that we want to compute the 
probability 

Pr [(X } , . ■, ， X 15 ) bC]= Vt (X > 75). 

Obviousiy, this probability is a functioqofthe parameter 9 and we shall denote 
it by The function K } (6) = Pr(X > 75) is called the power function of 
Test 1， and the value of the power function at a parameter point is called the 
power of Test I at that point. Because X is N(9^ 4), we have 



So, for illustration, we have，by Table III of Appendix B, that the power at 
0 = 75 is ^(75) = 0.500* Other powers are K { (73) = 0.159, is：,(77) - 0.841, 
and ATj(79) = 0.977, The graph of (9) of Test I is depicted in Figure 6.1. 
Among other things，this means that，if 0 = 75, the probability of rejecting 
the hypothesis H 0 ^0 <75 is j. That is, if 0 = 75 so that is true，the 
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probability of rejecting this true hypothesis // 0 is ^ Many statisticians and 
research workers find it very undesirable to have such a high probability as 
1 assigned to this kind of mistake: namely the rejection of H 0 when // 0 is a true 
hypothesis. Thus Test I does not appear to be a very satisfactory test. Let iis 
try to devise another test that does not have this objectionable feature. We 
shall do this by making it more difficult to reject the hypothesis H 0 , with the 

hope that this will give a smaller probability of rejecting H 0 when that 
hypothesis is true* 

Test 2 Let n — 25* We shall reject the hypothesis H Q :6 <75 and accept 
the hypothesis H { :6> 75 if and only if x > 7 & Here the critical region is 

C = {(4’ - • • ， + * ’ • + > (25)(78)}. The power function of 

Test 2 is，because X is 4 )， 


K 2 (0)^¥t(X>n) 


$ 


78 -e 


Some values of the power function of Test 2 are K 2 (73) 0-006, 

[ 2 (75) = 0.067, K 2 (V) = 0.309, and ^(79) - 0.691. That is，if 0 - 75, the 
probability of rejecting 丑 0 : 0 幺 75 is 0,067; this is much more desirable than 
the corresponding probability | that resulted from Test L However, if is 
false and，in fact, 6 — 77, the probability of rejecting Ha : 9 ^ 75 (and hence 
of accepting H { ： e> 75) is only 0.309, In certain instances, this low 
probability 0.309 of a correct decision (the acceptance of when H\ is true) 
is otyjectionable. That is, Test 2 is not wholly satisfactory. Perhaps we can 
overcome the undesirable features of Tests 1 and 2 if we proceed as in Test 3, 

1 

Test 3* Let us first select a power fimction K^(9) that has the features of 
a small value at 0 = 75 and a large value at 0 = 77, For instance, take 
【 3(75) = 0,159 and K 3 (77) = 0,841, To determine a test with such a power 
function, let us reject ^75 if and only if the experi mental value 3c of the 
mean of a random sample of size n is greater than some constant c. Thus the 
critical region is C = {(x_ ， ^ 2 ^ * -, x„) ： x l + x 2 + + … + jc n > nc}. It 
should be noted that the sample si@ 衫 and the constant c have not been 
determined as yet. However, since X is N(0, 100/rr), the power function is 

Kj(&) = Pr (f > 0 = 1 - 

XlOf^/nJ 


The conditions Kj(15) — 0.159 and ^ 3 (77) — 0,841 require that 



Equivalently, from Table III of Appendix B ? we have 



c — 77 
I0/v^ 
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The solution to these two equations in n and cisn^ J00 3 c = 76. With these 
values of n and c, other powers of Test 3 are A ： 3 (73) ^ 0.001 and 
A ： 3 (79) = 0.999. It is important to observe that although Test 3 has a more 
desirable power function than those of Tests 1 and 2, a certain “price” has 

been paid—a sample size of /j = 100 is required in Test 3, whereas we had 
w — 25 in the earlier tests* 

■# ■ 

Remark. Throughout the text we frequently say that we accept the 

hypothesis H 0 if we do not reject H 0 in favor of If this decision is made, 

it certainly does not mean that is true or that we even believe that it is true. 

All it means is, based upon the data at hand, that we are not convinced that 

the hypothesis// 0 is wrong. Accordingly, the statement “We accept /7。” would 

possibly be better read as “We do not reject Z/。*” However, because it is in 

fairly common use，we use the statement “We accept i/ 0 ，” but read it with this 
remark in mind* 

We have now illustrated the following concepts: 

1* A statistical hypothesis, 

2. A test of a hypothesis against an alternative hypothesis and the 
associated concept of the critical region of the test, 

3. The power of a test. + 

These concepts will now be formally defined. 

Definition 3. A statistical hypothesis is an assertion about the 

distribution of one or more random variables. If the statistical 

hypothesis completely specifics the distribution^ it is called a simple 

statistical hypothesis; if it does not，it is called a composite statistical 
hypothesis- 

If we refer to Example I, we see that both // 0 : Q < 75 and 

H x : 0 > 75 are composite statistical hypotheses, since neither of them 

completely specifies the distribution. If there, instead of//。 :6<75, we 

had H 0 :8 75, then H 0 would have been a simple statistical 

hypothesis. 

Definition 4_ A /m/ of a statistical hypothesis is a rule which, when 
the experimental sample values have been obtained, leads to a decision 
to accept or to reject the hypothesis under consideration. 

Definition 5- Let C be that subset of the sample space which, in 

accordance with a prescribed test, leads to the rejection of the 

hypothesis under consideration. Then C is called the critical region of 
the test. 
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Definition 6, The power function of a test of a statistical hypothesis 
H 0 against an alternative hypothesis //, is that function, defined for 
all distributions under consideration, which yields the probability that 
the sample point falls in the critical region C of the test, that is, a 
function that yields the probability of rejecting the hypothesis under 
consideration. The value of the power function at a parameter point 
is called the power of the test at that point. 

Definition 7, Let H 0 denote a hypothesis that is to be tested against 
an alternative hypothesis in accordance with a prescribed test. The 
significance level of the test (or the size of the critical region Q is the 

maximum value (actually supremum) of the power function of the test 
when H 0 is true. 

If we refer again to Example l y we see that the significance levels 
of Tests 1, 2, and 3 of that example are 0.500, 0.067, and 0.159, 
respectively. An additional example may help clarify these definitions. 

Example 2. It is known that the random variable A" has a p.dX of the form 

= 0 <jc< oo, 

= 0 elsewhere. 

It is desired to test the simple hypothesis // 0 :0 = 2 against the alternative 
simple hypothesis H x i 6 4. Thus ft = {0: 0 = 2, 4}. A random sample 

A，of size n — 2 will be used. The test to be used is defined by taking the 
critical region to be C = {(x^x 2 ) :9,5 <^+^< 00 }, The power function 
of the test and the significance level of the test will be determined. 

There are but two probability density functions under consideration, 
namely, f(x; 2) specified by H 0 and f(x; 4) specified by H { . Thus the power 
function is defined at but two points 0 = 2 and 6 = 4, The power function of 

the test is given by Pr [(X u X 2 )eQ. If 私 is true, that is, 0 = 2, the joint p.d.f, 
of X x and X 2 is 

/( 邛 2 )f(x 2 ; 2 ) = + 0 < < 00 ， 0 < x 2 < 00, 

— 0 elsewhere, 

and 

• * « 

Pt[(X^X 2 )€Q= 1 - Pr [(X^X 2 )gC^] 

*15 a9.S - Xj 

… \e^ x ^^ 2 dx { dx 2 

_ •， ，• v 

= 0 05, approximately. 
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If H x is true ， that is ， 0 = 4, the joint p.d.f. of X\ and X 2 is 

f(Xu 4)f(x 2 ; 4) ^ 去 e” /4 , 0 < x, < oo, 0 < x 2 < oo, 

■ 

— 0 elsewhere. 


and 


Pt[(X u X 2 )eQ 




/* 9.5 - X 2 


± e -iX [+ X 2} J4 dx ^ X2 


n) 


= 0.31, approximately. 

Thus the power of the test is given by 0.05 for 0 = 2 and by 0.31 for 0 = 4. 
That is, the probability of rejecting H 0 when H 0 is true is 0.05, and the 
probability of rejecting when H Q is false is 0.3 L Since the significance level 

of this test (or the size of the critical region) is the power of the test when 
is true, the significance level of this test is 0.05. 

The fact that the power of this test, when 0 = 4, is only 0.3] immediately 
suggests that a search be made for another test which，with the same power 
when 6 = 2, would have a power greater than 031 when d = 4, However later ， 
it will be dear that such a search would be fruitless. That is, there is no test 
with a significance level of 0.05 and based on a random sample of size n = 2 
that has greater power at 0 = 4* The only manner in which the situation may 
be improved is to have recourse to a random sample of size n greater than 2. 

Our computations of the powers of this test at the two points 0 = 2 and 
0 = 4 were purposely done the hard way to focus attention on fundamental 
concepts, A procedure that is computationally simpler is the following. When 
the hypothesis b true, the random variable X is ^ 2 (2)* Thus the random 
variable X x + X 2 = say, is ^ 2 (4). Accordingly, the power of the test when 
H 0 is true is given by 

Pr(F> 9,5) = 1 — Pr (y < 9*5) = 1 - 0.95 = 0.05 ， 

from Table II of Appendix B. When the hypothesis H x is true，the random 
variable Xj2 is x 2 (2 )； so the random variable {X x + X 2 )/2 — say, is 
Accordingly，the power of the test when H } is true is given by 

Pr (X { + JT 2 >9,5) = Pr (Z > 4.75) 

— \ze^ 211 dz y 

J475 

which is equal to 0.31 ， approximately. 

Remark. The rejection of the hypothesis H 0 when that hypothesis is true 
is，of course, an incorrect decision or an error. This incorrecl decision is often 
called a type I error; accordingly，the significance level of the test is the 
probability of committing an error of type I. The acceptance of H 0 when H Q 
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is false (//( is true) is called an error of type II. Thus the probability of a 
fype H error is 1 minus the power of the test when H t h true- Frequently，it 
is disconcerting to the student to discover that there are so many names for 
the same thing. However，since all of them are used in the statisticEl literature ， 
we feel obligated to point out that “significance !evd，” “size of the critical 
region，” “power of the test when H 0 is true，” and “the probability of 
committing an error of type I” are all equivalent, 

EXERCISES 

6*38* Let X have a p.d.f. of the form f(x; 6) = 9^^\Q <x < I s zero 
elsewhere, where 9e{e ： 9^ I, 2}, To test the simple hypothesis i7 0 ;0 ^ 1 
against the alternative simple hypothesis H } :0 = 2 $ use a random 
sample X u X 2 of size « = 2 and define the critical region to be 
C = {(X |， x 2 ) ： l< x t x 2 }. Find the power function of the test. 

6.39. Let X have a binomial distribution with parameters n — 10 and 
pe{p:p^l^}. The simple hypothesis // 0 : /? = i is rejected, and the 
alternative simple hypothesis :/? = |is accepted, if the observed value of 
A"!，a random sample of size I, is less than or equal to 3. Find the power 

function of the test. 

■ • 

6,40* Let X I, Xj be a random SHmplc of size n — 2 from the distribution having 
pAS.f(x; 8) = (1/ 沒 ) e— ㈣， 0 <x< oo, zero elsewhere. We reject H 0 :6^ 2 
and accept //,: 0 ^ 1 if the observed values of X u X 2 , say x r5 x 2 , are such 
that 

* 

f(xu 2)f(x 2 ; 2) 1 

Axamx 2 ； l)-2' 

Here Q — {6:8= 1, 2}, Find the significance level of the test and the power 
of the test when H Q is false, 

6,41. Sketch, as in Figure 6 」， the graphs of the power functions of Tests 1 ， 

2, and 3 of Example I of this section. 

6*42, Let us assume that the life of a tire in miles, say X, is normally distributed 
with mean 6 and standard deviation 5000, Past experience indicates that 
6 = 30,000. The manufacturer claims that the tires made by a new process 
have mean 9 > 30,000, and it is very possible that 6 = 35,000, Let us check 
his claim by testing H 0 :6^ 30,000 against H { :Q> 30,000. We shall 
observe n independent values of X, say x 卜 " • ，人， and we shall reject 
(thus accept if and only ifx>c. Determine /t and c so that the power 
function K(8) of the test has the values ^(30,000) 0 01 and 

A ： (35,000) = 0.9 & ， 

+ ^ ^ - 4 ， ■ 

* « 

6.43. Let X have a Poisson distribution with mean 0. Consider the simple 
hypothesis H 0 :8 — \ and the alternative composite hypothesis H l : 6 < ^ 
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Thus D = {0 ： 0 < 0 < jj. Let 不 ” •” Jf, 2 denote a random sample of size 
12 from this distribution. We reject // 0 if and only if the observed value of 
F — + ■ — + A"t 2 < 2. If K(8) is the power function of the test, find the 

powers K(\) y AT(|), and Sketch the graph of K(6). What is 

the significance level of the test? 

6.44, Let Y have a binomial distribution with parameters n and p. We reject 

Hq : p j and accept 去 if Y > c. Find n and c to give a power 

function K(p) which is such that K(^)^0A0 and 尺 ④ = 0.95, 
approximately. 

6.45. Let Y l < Y 2 < Y 3 < Y 4 be the order statistics of a random sample of size 
n = 4 from a distribution with pA.t f(x; 0) = 1 j6, 0 < x < 0, zero 
elsewhere, where 0 < 0. The hypothesis H o :0 ^ Us rejected ^ndff l :6> I 
accepted if the observed V 4 > c. 

(a) Find the constant c so that the significance level is a ^ 0,05* 

(b) Determine the power function of the test* 

♦ • 围 

•， • , U t 

6.5 Additional Comments About Statistical Tests 

.■— - 

All of the alternative hypotheses considered in Section 6,4 were 
one-sided hypotheses. For illustration, in Exercise 6,42 we tested 
: 0 = 30,000 against the one-sided alternative H l :0> 30,000 ， 
where 0 is the mean of a normal distribution having standard deviation 
° — 5000. The test associated with this situation, namely reject H Q if 
and only if the sample mean X > is a one-sided test For convenience, 

we often call H Q : 6 = 30,000 the null hypothesis because, as in this 
exercise, it suggests that the new process has not changed the mean of 
the distribution. That is，the new process has been used without 
consequence if in fact the mean still equals 30,000; hence the 
terminology null hypothesis is appropriate* So in Exercise 6,42 we are 
testing a simple null hypothesis against a composite one-sided 
alternative with a one-sided test. 

This does suggest that there could be two-sided alternative 
hypotheses. For illustration，in Exercise 6.42, suppose there is the 
possibility that the new process might decrease the mean. That is, say 
that we simply do not know whether with the new process 0 > 30,000 
or 0 < 30,000; or there has been no change and the null hypothesis 
// 0 : 0 = 30,000 is still true. Then we would want to test H Q ：0 = 30,000 
against the twosided alternative : 0 ^ 30’000. To help see how to 
construct a two-sided test for H 0 against H u consider the following 
argument. 
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altemati^^^oo/we uLdX> Jof 0 eqLTently, 1 ^ 


z 


30,000 ^ e 


30,000 


where since X is _ 


A ， 


°A/^ ^/y/n* 

30 , 000 J 2 A 0 under //h Z is and we 

have a test of significance IpvpI ~ 〜n nc 


^ 5t ° haVeateSt 一疏 a: levels 0^)~ 

^ TPt t ： jT, q z nr 

tf: 2 andTeS ifFor Zis too il^Tor !oo sma^N 
if we reject H 0 and accept H { when arge or t00 咖 !!• Namely ， 


|Z| 


30,000 



> 1,96, 


^X ： ^tn： a05 beC3USe ^ " the __一 

It is interesting to note that the latter test is the eauivalent n f 
Sa /'°f ^at we reject H 0 and accept H x if 30,000 is not hi the (Lo 
S，ded) 咖触卿 interval for the mean 0. Or equivalently, If 

又 - l 96 ~^ < 30,000 < ^ + 1.96 — , 
2=^= ^ 心 0 , 000 bCCaUSe those Qualities are 


- 30,000 


< 1,96, 


which leads to the acceptance of // 。：沒 = 30 000 

, nd ° f ^ 7/ eCog u nize this Mati ⑽ hip between confidence intervals 
and ests of hypotheses, we can use all those statistics that we used to 
cons ruct confidence intervals to test hypotheses, not only against 

二 =n a =? e tiVeS ⑽ one - si , _ 郎讀 Without Lting all 
be undemood £ " € enou ^ h of the ^ so that the principle can 

of stTcomi^fl^ 2^ e 7 - t andthe Varia „ ceofarandom 

Ho ：against the two-sided akernative : T#^"rejeclTf = °'° 5, 

X 


m 






>h. 
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where b is the 97.5th percentile of the /-distribution with n — 1 degrees of 
freedom. 

Example Z Let independent random samples be taken from a 2 ) and 
巧只 2 , 〆} ， respectively. Say these have the respective sample characteristics /i, 
X, S] and m, P ， 努 . At a = 0*05, reject H 0 : ^ — fi 2 and accept the one-sided 
alternative H } : if 

t _ x — y — o_^ ^ 

fnS] + mSlfl l\ 

n + m - 2mj 

Note that X — Y has a normal distribution with mean zero under // 0 . So c 
is taken as the 95th percentile of a /-distribution with n + m — 2 degrees of 
freedom to provide a = 0.05. 

Example 3. Say Yis b(n, p). To test H 0 :p — p Q against H x :p < p 0 , we use 
either 





(Y/n)^p Q 


^PoO 一 Aj)/« 


< c 


or 


Z 2 


(r/rt) — p 0 


7( W(1 - Yin)jn 


< c. 


If n is large, both Z, and Z 2 have approximate standard normal distributions 
provided that H 0 :p - p 0 is true. Hence c is taken to be —1,645 to give an 
approximate significance level of a = 0.05. Some statisticians use Z { and 
others Z 2 . We do not have strong preference one way or the other because 
the two methods provide about the same numerical result. As one might 
suspect，using Z, provides better probabilities for power calculations if the 
true p is dose to p 0 while Z 2 is better if // 0 is clearly false. However, with a 
two-sided alternative hypothesis, Z 2 does provide a better relationship with 
the confidence interval for p. That is, (Z 2 | < 2 is equivalent to p Q being in the 
interval from 

Y , l(Yfn)(l - Yjn) ‘ y t 

——/ j -’ 匕 - - to - h 

n v n n 


KYjn)(l ^ Yjn) 
n 


which is the interval that provides a 95.4 percent confidence interval for p as 
considered in Section 6,2. 

In dosing this section, we introduce the concepts of randomized 
tests and p-values through an example and remarks that follow the 
example. 

• •' 

Example 4* Let f 2 ,. … ， X i0 be a random sample of size n — 10 from 

a Poisson distribution with mean 0. A critical region for testing H Q : 6 = 0A 

10 

against H { :d > 0A is given by Y ^ T X f > 3. The statistic Y has a Poisson 
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distribution with mean 100, Thus, with 9 = 0.1 so that the mean of Yis the 
significance level of the test is ’ 

Pr (F> 3) - I - Pr (7 < 2) = i - 0.920 = 0.080. 

, 10 

If the critical region defined by J]x t > 4 is used, the significance level is 

I 

■R 

a = Pr(r>4) = l -Pr(F< 3)= 1 —0,981 = 0,019. 

If a significance level of about a = 0.05 ， say, is desired, most statisticians 
would use one of these tests; that is, they would adjust the significance level 
to that of one of these convenient tests. However, a significance level of 

a = 0.05 can be achieved exactly by rejecting jc, > 4or if£^ = 3and 

an auxiliary independent random experiment resulted in “success,” where the 
probability of success is selected to be equal to 

0 删 一 0.019 31 

a080- 0.019 = 6l ■ 

This is due to the fact that, when 8 — 0A so that the mean of F is 1 ， 

Pr (F > 4) + Pr (K = 3 and success) = 0.019 + Pr(F = 3) Pr (success) 

= 0.019+ (0,061)|| = 0,05* 

The process of performing the auxiliary experiment to decide whether to reject 
or not when K = 3 is sometimes referred to as a randomized test. 

Remarks. Not many statisticians like randomized tests in practice, 
because the use of them means that two statisticians could make the same 
assumptions, observe the same data, apply the same test, and yet make 
different decisions. Hence they usually adjust their significance level so as not 
to randomize. As a matter of fact，many statisticians report what are 
commonly called p-values (for probability values). For illustration, if in 
Example 4 the observed ri$^ = 4, the Rvalue is 0.019; and if it is^-3, the 
p-value is 0.080. That is，the /?-value is the observed “tail” probability of a 
statistic being at least as extreme as the particular observed value when is 

’rue. Hence，more generally，if F = u{X u X 2j - *. 5 X„) is the statistic to be used 
in a test of H Q and if the critical region is of the form 

u(x u x 2 , - , xj < c, 

an observed value u(x i7 x 2 ^ …， = d would mean that the 

value Pt(Y< d; H 0 y 

That is，if GOO is the distribution function of u{X u …、 X n \ provided 
that H 0 is true, the /?-value is equal to G(d) in this case. However, 






292 


Introduction to Statistical Inference [Ch, 6 


G( Y), in the continuous case, is uniformly distributed on the unit interval, so 
an observed value G(d) < 0*05 would be equivalent to selecting c, so that 

Pr [u(X u X 2 , ^^X n )<c; H 0 ] - 0_05 

and observing that d<c. Most computer programs automatically print out 
the /7-va!ue of a test, 

嚅 

Example 5. Let X u X lf .. ■ ， X 25 be a random sample from N(ji^ a 2 — 4). 
To test pt —11 against the one-sided alternative hypothesis H { : 
fi < 77, say we observe the 25 values and determine that x = 76,1, The 
variance of X is u 2 jn = 4/25 =0.16; so we know that Z — (Jf— 77)/0,4 
is JV(0, 1) provided that ^ — 77. Since the observed value of this test statistic 
is z — (76-1 — 11)/0A — —2,25, the p-value of the test is 0( — 2,25)= 

1 — 0,988 = 0.012. Accordingly, if we were using a significance level of 
a = 0.05, we would reject and accept H y : ii<ll because 0.012 < 0*05* 

EXERCISES 

6.46, Assume that the weight of cereal in a “10-ounce box” is N(^ a 2 ). To 
test H 0 : ii= 10,1 against H x : (x > 10.1, we take a random sample of size 

r n = !6 and observe that 3c — 10.4 and s — 0A 

(a) Do we accept or reject H 0 at the 5 percent significance level? 

(b) What is the approximate /7-value of this test? 

6.47. Each of 51 golfers hit three golf balls of brand X and three golf balls 

of brand Y in a random order. Let and Y f equal the averages of the 
distances traveled by the brand X and brand Y golf balls hit by the ith golfer ， 
i — 1,2,,,*, 5h Let — — Y h i = 1, 2,,,,, 51 * Test /f 0 : ^l w — 0 

against H y : fi w > 0, where /i^is the mean of the differences. If = 2,07 and 
s 2 w — 84.63, would // 0 be accepted or rejected at an a — 0.05 significance 
level? What is the /?-value of this test? 

6.48. Among the data collected for the World Health Organization air quality 
monitoring project is a measure of suspended particles in fig/nA Let X and 
Y equal the concentration of suspended particles in fig/m 3 in the city center 
(commercial district) for Melbourne and Houston, respectively* Using 
n = 13 observations of X and m = \6 observations of K, we shall test 
H 0 : fi x = fi Y against H^ ： fx x < 

(a) Define the test statistic and critical region, assuming that the variances 
are equal. Let a = 0.05, 

(b) Ifx — 72.%s x = 25.6, y — SL7 f and s y = 28,3, calculate the value of the 
test statistic and state your conclusion. 

6.49, Let p equal the proportion of drivers who use a seat belt in a state that 
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If we change the variable of integration in this last integral by writing 
w 2 — then 

f*Z 

lim H n (z)= ,| 

provided that 2 > 0, If z < 0, then lim H„(z) — 0. Thus lim H n (z) is 

equal to the distribution function of a random variable that is x 2 (U， 
This is the desired result. 


r(i)2 l/2 


v Xjl _ dv. 


Let us now return to the random variable X l which is b(n ， p' h Let 
X 2 = n — X l and let p 2 — l — If we denote Y 2 by Q } instead of Z y 
we see that may be written as 

— (r,— np,f ^ (X x - np x y (Xy- np x f 
^ np s {\ -p x )~ np x 十 n{\ - p x ) 

_ « - mf | (^2 - np 2 f . 

~ m m 


because (X, — np') 1 = (n — X 2 — n + np 2 ) 2 = (X 2 — n/? 2 ) 2 . Since Q { has 
a limiting chi-square distribution with I degree of freedom, we say, 
when n is a positive integer, that Qi has an approximate chi-square 
distribution with 1 degree of freedom. This result can be generalized 
as follows. 

Let H …，兄… have a multinomial distribution with the 
parameters n 7 p u ,,, rPk^u as in Section 3.L As a convenience, let 
X k = n — (X^ + ■ ■ ■ + A _ f) and let p k = \ — (^p t + ‘. * + p k _ 
Define Q k ^ t by 



It is proved in a more advanced course that, as n^co^ Q k “ has a 
limiting distribution that is ^{k — I), If we accept this fact，we can say 
that Q k ^i has an approximate chi-square distribution with k — I 
degrees of freedom when n is a positive integer. Some writers caption 
the user of this approximation to be certain that n is large enough that 
each np h i — 1 ，2,…，众， is at least equal to 5, In any case it is important 
to realize that Q k - \ does not have a chi-square distribution, only an 
approximate chi-square distribution. 

The random variable Q k ^i may serve as the basis of the tests of 
certain statistical hypotheses which we now discuss. Let the sample 
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space ^ of a random experiment be the union of a finite number 
k of mutually disjoint sets A t , 為 ”. • ， ▲• Furthermore, let P(A S ) = p h 

i = 1 ， 2,… ， A:，where p k = ] —p { - /?“ ■，so that is the 

probability that the outcome of the random experiment is an element 
of the set A r . The random experiment is to be repeated n independent 
times and X t will represent the number of times the outcome is an 
element oT the set That is ， f|，• * * ， = " — * — - _ | 

are the frequencies with which the outcome is ， respectively, an element 

of U 2 , … ，為 * Then the joint p_d,f of X U X 2 . X k _ x is the 

multinomial p.dX with the parameters /?j,., • ， p 卜 卜 Consider the 
simple hypothesis (concerning this multinomial p*dX) H^ : p { ~ p ]d . 

Pi ^ Pi^ . ， . ， A - I = (Pk = Ao = 1 -Pm -- A • 】 ,o)，where 

Pw^ … ， Pk -\》are specified numbers. It is desired to test H 0 against all 
alternatives* 

If the hypothesis ff 0 is true, the random variable 


^ — 打 Pm ) 2 

i mo 


^ Z 
\ 


Q 卜 


has an approximate chi-square distribution with k — I degrees of 
freedom. Since, when H 0 is true, np l0 is the expected value of X h one 
would feel intuitively that experimental values of Q 卜 ' should not be 
:4oo large if H 0 is true. With this in mind, we may use Table II of 
Appendix B, with k — 1 degrees of freedom, and find c so that 
Pr (Q k ^ { > c) = a, where oc is the desired significance level of the test. 
If ， then, the hypothesis H 0 is rejected when the observed value of • 
is at least as great as c, the test of H 0 will have a significance level that 
is approximately equal to sc. 

Some illustrative examples follow. 

Example I. One of the first six positive integers is to be chosen by a 
random experiment (perhaps by the cast of a die). Let A, - {x: x - 
/ = ! ， 2, ,. • , 6, The hypothesis H 0 : P(A f ) = p iQ = = 1 ， 2, • • ” 6, will be 

tested，at the approximate 5 percent significance level, against all alternatives. 
To make the test, the random experiment will be repeated, under the same 
conditions, 60 independent times* I n this example k — 6 and np tQ : = 60(1) = 10, 
7= “2,… ， 6. Let X t denote the frequency with which the random 
experiment terminates with the outcome in A h / = ! ， 2, "”6，and let 


05 ~ Z ~ 10) 2 /10, If // 0 is true. Table H, with A — 1 = 6 — 1 = 5 degrees 

j 

of freedom, shows that we have Pr (^ 5 > 11.1) = 0.05, Now suppose that 
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does not have a mandatory seat belt law- It was claimed that p — 0J4, An 
advertising campaign was conducted to increase this proportion. Two 
months after the campaign, y — 104 out of a random sample of n = 590 
drivers were wearing their seat belts. Was the campaign successful? 

(a) Define the null and alternative hypotheses. 

(b) Define a critical region with an <x — 0.01 significance level. 

(c) Detenniiie the approximate /rvalue and state your conclusion. 

6*50. A machine shop that manufactures toggle levers has both a day 
and a night shift. A toggle lever is defective if a standard nut cannot be 
screwed onto the threads, Lctpi and p 2 be the proportion of defective levers 
among those manufactured by the day and night shifts, respectively. We 
shall test the null hypothesis, H Q : p x = p 2 , against a two-sided alternative 
hypothesis based on two random samples, each of 1000 levers taken from 
the production of the respective shifts. 

(a) Define the test statistic which has an approximate N(0 t 1) distribution. 
Sketch a standard normal p.d,f. illustrating the critical region having 
a = 0.05- 

(b) If y x = 37 and y 2 ― 53 defectives were observed for the day and night 
shifts ， respectively, calculate the value of the test statistic and the 
approximate /7-value (note that this is a two-sided test). Locate the 
calculated test statistic on your figure in part (a) and state your 
conclusion, 

6.51. In Exercise 6,28 we found a confidence interval for the variance o 2 using 
the variance S 2 of a random sample of size n arising from N(ji, a 2 \ where 
the mean fi is unknown. In testing H 0 :o 2 = against Hi ： €r 2 > cr^ use the 
critical region defined by n^/al > c. That is，reject H 0 and accept H x if 
S 2 > cal/n. If n = 13 and the significance level a — 0,025, determine c. 

6*52 - In Exercise 6.37* in finding a confidence interval for the ratio of 
the variances of two normal distributions，we used a statistic 
[nS]j{n — \)]l[mSlj(m — 1)], which has an ^-distribution when those two 
variances are equal. If we denote that statistic by we can test \a\ = 
against using the critical region F> cAf n — !3 5 m — 11, and 

a = 0.05, find c, 

6.6 Chi-Square Tests 

In this section we introduce tests of statistical hypotheses called 
chi-square tests. A test of this sort was originally proposed by Karl 
Tearson in 1900, audit provided one of the earlier methods of statistical 
inference. 
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Let the random variable X f be N(fi h 4 )， / = 1 ， 2, . • ■ ， m ，and let 

X u X 2 ^ ^ , X n b^ mutually independent. Thus the joint p.d.f_ of these 
variables is 


qcr 2 … 

i_ i 




— 00 < X f < 00. 


The random variable that is defined by the exponent (apart from 

n 

the coefficient is [ ( 不 一 叫 ) 2 /of, and this random variable is x 2 (p)- 
• 1 _ ' ■— 」 

In Section 410 we generalized this joint normal distribution 
of probability to n random variables that are dependent and we call the 
distribution a multivariate normal distribution. In Section 10.8, it will 
be shown that a certain exponent in the joint p.d.f (apart from a 
coefficient of — defines a random variable that is This fact is 
the mathematical basis of the chi-square tests. 

Let us now discuss some random variables that have approximate 
chi-square distributions. Let Xi be b(n, pi). Since the random variable 
(X l ~ npxJ/^/npiil —pi) has，as /i — oo，a limiting distribution 
that is N(0 7 1 ) 5 we would strongly suspect that the limiting distribution 
of Z - Y 2 is x 2 0)- This is，in fact, the case, as will now be shown. If 
G n (y) represents the distribution function of F, we know that 

-V ■* 

Hm G n {y) = — 00 < 少 < 00 ， 

/ 7-*00 * • 

where d>(y) is the distribution function of a distribution that is N(0, 1), 
Let H n (z) represent，for each positive integer n, the distribution 
function of Z F 2 , Thus, if z > 0 ， 


队 (z) -Pr(Z <z) = Pr {~y/z < F< J~z) 


Accordingly，since <Sf(y) is everywhere continuous, 


lim H„(z) ^ 


2 I -4= e^ 12 dw. 
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the experimental frequencies of 為， ... ，為 are ， respectively, 13, 19, 11, 
8 S 5, and 4, The observed value of Q 5 is 

(13 - IO) 2 (19 — I0) 2 (11 - iO) 2 (8 - 10) 2 

io ^ ~10~~10 + ~To~ 


(5 -晰 _ (4—iO) 2 

+ — + l 


15,6 


Since 15.6 > I1J, the hypothesis P(A) = = 1 ， 2,… ， 6, is rejected at the 

(approximate) 5 percent significance level. 

Example 2. A point is to be selected from the unit interval {x:0 < x < 1) 
by a random process. Let A { = {x:0 <x< A 2 = {x:\<x^ |} 5 = 

{x ： j< x < |}， and A 4 = {x:\< x < 1}* Let the probabilities p h / = 1 ， 2, 3, 4, 
assigned to these sets under the hypothesis be determined by the p.d.f 2x, 
0 < x < 1, zero elsewhere. Then these probabilities are, respectively, 

f*l/4 * 

Pio = 2xdx — ^ p 2(i — p 3Q — ^ p m — 


Thus the hypothesis to be tested is that Pu P 2 , P 》， and p A — J — Pv — P 2 ~ Pi 
have the preceding values in a multinomial distribution with A ： = 4. This 
hypothesis is to be tested at an approximate 0.025 significance level by 
repeating the random experiment n — SO independent times under the same 
conditions. Here the np tQ , i = 1 ， 2, 3, 4, are, respectively, 5, 15, 25, and 35, 
Suppose the observed frequencies of A u A If A 3 , and A 4 are 6, 18, 20, and 

• 4 

36, respectively. Then the observed value of - =L ( x i - n Pfo) 2 l(mo) is 

(6 - 5) 2 (18 - 15) 2 (20 - 25) 2 (36 - 35) 2 64 

~5 — + ^ _ + 25 十 35 = 35 = L83 ， 

approximately. From Table II, with 4—1=3 degrees of freedom, the value 
corresponding to a 0.025 significance level is c — 9*35* Since the observed 
value of 仏 is less than 9.35, the hypothesis is accepted at the (approximate) 
0.025 level of significance. 

r « 

Thus far we have used the chi-square test when the hypothesis H 0 
is a simple hypothesis. More often we encounter hypotheses H 0 in 
which the multinomial probabilities p\,p 2 ^ …， pk are not completely 
specified by the hypothesis H 0 . That is, under these probabilities 
are functions of unknown parameters* For illustration, suppose that 
a certain random variable Y can take on any real value. Let us partition 
the space {y: —co < y < oo} into k mutually disjoint sets 
A u A l7 -.. ,A k so that the events K … ，為 are mutually exclu- 
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sive and exhaustive. Let H Q be the hypothesis that Fis N(^i, tj 2 ) with 
H and a 1 unspecified. Then each 


Pi 


A 




exp [~(y~ fxfjla 2 ] dy 9 i = m 


is a function of the unknown parameters fi and a 2 . Suppose that we take 
a random sample Y { , Y n of size n from this distribution. If we let 
Xi denote the frequency of A h i = 1 ,2, ■ ■ •，人 so that 
不 + … + 尤 =the random variable 




女 (X,- n Pi ) 2 
= ]m 


cannot be computed once X l9 , X k have been observed，since each 
Ph and hence t , is a function of the unknown parameters fi and a 2 . 

There is a way out of our trouble ， however. We have noted that 
Qk - 1 is a function ofand a 1 . Accordingly, choose the values of ft and 
<r 2 that minimize Q k ^[. Obviously, these values depend upon the 
observed X v = x u ... ^X k — x k and are called minimum chi-square 
estimates of \i and a 2 . These point estimates of u and a 2 enable us to 

mf 7 m m ■■ ■■ ■ I ■ n I 、 

compute numerically the estimates of each p h Accordingly, if these 
values are used, Q k _ t can be computed once Y l , Y 2 . Y n , and hence 

X 2 , ,,, ， X k 、are observed. However, a very important aspect of the 
fact, which we accept without proof, is that now Q k _ f is approximately 
X 2 (k — 3). That is, the number of degrees of freedom of the limiting 
chi-square distribution of ^ | is reduced by one for each parameter 
estimated by the experimental data. This statement applies not only to 
the problem at hand but also to more general situations. Two examples 
will now be given. The first of these examples will deal with the test of 
the hypothesis that two multinominal distributions are the same. 

Remark* In many instances, such as that involving the mean fi and the 
variance cr 2 of a normal distribution, minimum chi-square estimates are 
difficult to compute. Hence other estimates, such as the maximum likelihood 
estimates fi — Y and a 2 — S 2 , are used to evaluate p f and Q k - X . In general, 
Qk-i is not minimized by maximum likelihood estimates, and thus its 
computed value is somewhat greater than it would be if minimum chi-square 
estimates were used Hence, when comparing it tea critical value listed in the 
chi-square table with k — 3 degrees of freedom, there is a greater chance of 
rejecting than there would be if the actual minimum of Q h _ x is used. 
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Accordingly, the approximate significance level of such a test will be some¬ 
what higher than that value found in the table. This modification should be 
kept in mind and, if at all possible，each p t should be estimated using the 
frequencies X,,, _ ? X k rather than using directly the observations 
Y u Y 2f ... f Y n of the random sample. 

Example 3* Let us consider two muitinomial distributions with pa¬ 
rameters n p pij ， p 2j ， … ,p kj , j = 1,2, respectively. Let X ip / = U2, … ， k ， 
j — 1 5 2, represent the corresponding frequencies, ltn { and n 2 are large and the 
observations from one distribution are independent of those from the other, 
the random variable 

f f (Kj - n jPij f 

is the sum of two independent random variables, each of which we treat as 
though it were ^{k — 1); that is, the random variable is approximately 

X 2 (2k — 2). Consider the hypothesis 

& ■ * 

： Pll ^ P\2^P2[ ~ P22y * * - » Pk\ ~ Pk2, 

where eachp i} —p i2 , i = 1 ， 2, ， " ， A:，is unspecified. Thus we need point esti¬ 
mates of these parameters. The maximum likelihood estimator of p n — p t2 ’ 
based upon the frequencies X ip is (X fl + X i2 )/(n l + &) ， i* = 1 ， 2, .. •，欸 Note 
that we need only k — I point estimates，because we have a point estimate of 
Pk\ =Pk 2 once we have point estimates of the first k - 1 probabilities. In 
accordance with the fact that has been stated, the random variable 

H {Ay — + 4 - « 2 )]} 2 

>"= r ^ + 知 )/("1 + 〜)] 

_ __ -- —- —^〜 

has an approximate jf 2 distribution with 2k — 2 — (k — l) — k — l degrees of 
freedom. Thus we are able to test the hypothesis that two multinomial 
distributions are the same; this hypothesis is rejected when the computed value 
of this random variable is at least as great as an appropriate number from 
Table II, with k degrees of freedom. 

The second example deals with the subject of contingency tables. 

Example 4. Let the result of a random experiment be classified by two 
attributes (such as the color of the hair and the color of the eyes). That is, one 
attribute of the outcome is one and only one of certain mutually exclusive and 
exhaustive events, say A u A 2 , ^ ^ A a ; and the other attribute of the outcome 
is also one and only one of certain mutually exclusive and exhaustive events, 
say B h B 29 ••” 鳥 • Let ‘ = — 1, 2,.,., a; j = 1 ， 2” … ， ft. 

The random experiment is to be repeated n independent times 
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and X tj will denote the frequency of the event 為 n Since there are k^ab 
such events as A t n B h the random variable 

n — 钐 广 np # 
ab ~'~ y?i /?i ~m~ 

has an Hpproximate chi-square distnKutTon^with ob — 1 degrees of freedom, 
provided that n is large. Suppose that we wish to test the independence of 
the j attribute and the B attribute; that is y we wish to test the hypothesis 

Hq: P(A f n Bj ) 二 PiA^PiBj), i — 1, 2,,.,, a;j — 1 ， 2, ，. • ， A. Let us denote 
PiAi) by p L and P(Bj) by p } ; thus 

= I Pih P •广 I Pu ， 

卜 i /= I 

and 

b a t a 

1 = I I Pu ^ t Pj - I A.. 

y ^ i / = i / /= i 

Then the hypothesis can be fonnulated as H 0 : p if =p t . p p / = 2, ... ， a; 

y = 1, 2^ t .. f b. To test 77 0 , we can use Q 油 — t with p f j replaced by p lr p m 
But if p it , / = 1 , 2, .,. s a, and p y , j = 1 ， 2, …， 占 ， are unknown, as they 
frequently are in applications, we cannot compute Q ab _ 、 once the frequencies 
are observed. In such a case we estimate these unknown parameters by 

where 尤冗； 1 = 1 ， 2 ，"_，仏 

j= i 

and 

. X . u 

P •严卞， where S x ip 卜 1 ， 2, …， b- 

/ - r 

Since ，产 ^ >"= 1, we have estimated only a — 1 +b — l —a + b—2 

1 i 

parameters. So if these estimates are used in Q 岫一 ’ with p tJ = p L p j9 then, 
according to the rule that has been stated in this section, the lundoni variable 

A A l^~ n{XJn){X,jjn)f 
h A n(XJn)(X mJ /n ) 一 

has an approximate chi-square distribution with ab — l — (a b — 2) = 

(a - l)(b~ 1) degrees of freedom provided that ^ is true. The hypothesis H 0 

is then rejected if the computed value of this statistic exceeds the constant c, 

where c is selected from Table II so that the test has the desired significance 
level a. 

In each of the four examples of this section, we have indicated that 
the statistic used to test the hypothesis H 0 has an approximate 
chi-square distribution，provided that n is sufficiently large and H 0 is 
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true. To compute the power of any of these tests for values of the 
parameters not described by H 0 , we need the distribution of the statistic 
when is not true. In each of these cases, the statistic has an 
approximate distribution called a noncentral chi-square distri¬ 
bution ， The noncentrai chi-square distribution will be discussed in 
Section 10 . 3 . 

EXERCISES 


6.53* A number is to be selected from the interval {x: 0 < x < 2} by a 
random process. Let = 1)/2 < jc< f/ 2 }, 1 , 2 , 3 , and let 

^4 = * I < ^ < 2 }, A certain hypothesis assigns probabilities /?^ to these 

sets in accordance with/? /a = ^ (0(2 - jc ) dx, i. = 1 ， 2 , 3, 4 . This hypothesis 

(concerning the multinomial p_df. with k ^ 4 ) is to be tested, at the 5 
percent level of significance, by a chi-square test If the observed frequencies 
of the sets A h /= 1 ， 2, 3, 4, are ， respectively, 30, 30, 10, 10, would be 
accepted at the (approximate) 5 percent level of significance? 

6*54 - Let the following sets be defined: {x : 一 oo < x < 0 }, 

{x\i —2 <x <i — 1} ? / = 2, • • - ， 7, and {x:6 < x < oo}. A 

certain hypothesis assigns probabilities p m to these sets A f in accordance 
with 



This hypothesis (concerning the multinomial p，dX with k == 8 ) is to be 
tested, at the 5 percent level of significance, by a chi-square test. If the 
observed frequencies of the sets A h / = I, 2,, _ s 8 , are, respectively, 60,96, 
140, 210, 172 ， 160, 88 , and 74, would N 0 be accepted at the (approximate) 
5 percent level of significance? 


6*55. A die was cast n = 120 independent times and the following data 
resulted ： 


Spots up 

12 3 4 5 6 

Frequency 

b 

20 20 20 20 40-b 


If we use a chi-square test，for what values of b would the hypothesis that 
the die is unbiased be rejected at the 0.025 significance level? 

6_56. Consider the problem from genetics of crossing two types of peas. 
The Mendelian theory states that the probabilities of the classifications 
(a) round and yellow, (b) wrinkled and yellow, (c) round and green，and 
< d > wrinkled and green are ^ and respectively. If, from 160 

independent observations, the observed frequencies of these respective 
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Test，at the 0*05 significance level, the hypothesis of independence of the 
A attribute and the B attribute, namely // 0 : P(A f n 

1 — H 3 and j = 1 ， 2, 3, 4, against the alternative of dependence. 

■w * 1 T 

, 舞 T ■ , * ▲ ‘ 

6.59: A certain genetic model suggests that the probabilities of a particular 
trinomial distribution are ， respectively, p { = p\ p 2 ^ 2p(\ - p), and 

/? 3 = (1 — pf, where 0 < /? < L If X lf X 2 , represent the respective 
frequencies in n independent trials, explain how we could check on the 
adequacy of the genetic model. 

6*60* Let the result of a random experiment be classified as one of the mutually 
exclusive and exhaustive ways A u A 2 , A, and also as one of the 


classifications are 86 ， 35, 26, and 13, are these data consistent with the 
Meadelian theory? That is ， test, with a = 0.01，the hypothesis that the 

respective probabilities are 备，盖， A，and | 

6_57, Two different teaching procedures were used on two different groups 
of students. Each group contained 100 students of about the same ability. 
At the end of the term, an evaluating team assigned a letter grade to each 
student. The results were tabulated as follows. 





Grade 




_ 

Group 

A 

B 

* ，-〆 

C 

^ - ^ 

D 

F 

Total 

I 

15 

25 

32 

17 

ii 

100 

II 

9 

18 

29 

,28 

16 

100 


If we consider these data to be independent observations from two 
respective multinornia.l distributions with k — 5^ test, at the 5 percent 
significance level，the hypothesis that the two distributions are the same 

(and hence the two teaching procedures are equally effective)* 

， * 

6,58, Let the result of a random experiment be classified as one of the mutually 
exdusjve and exhaustive ways A U A 2 , A 3 and also as one of the mutually 
exclusive and exhaustive ways jfi l5 B ly Two hundred independent 

trials of the experiment result in the following data: 


A 




3 4 
1 2 


5 17 
12 2 


52 


2 2 1 


5110 


I 2 3 
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^2 


b 4 

^1 

15- 3k 

15 — k 

* 

一 — -- 

15 + 3* 


15 

15 

•- 

15 

A, 

15 + 3A:' 

I5 + * 


15 — 3 众 


， re 巧 ^° fth u inte S ers 0 , i， u 2 , 3 , 4 , 5. What is the smallest value of 
k that will lead to the rejection of the independence of the A attribute and 
the 5 attribute at the oc = 0,05 significance level? 

6-61. It is proposed to fit the Poisson distribution to the following data 

o r ‘ 


Frequency 


20 40 


2 


16 


3 


3 < x 


18 6 

(a) Compute the corresponding chi-square goodness^of-fit statistic 
Hint: In computing the mean，treat 3 < 太 as 太 = 4 

(b) How many degrees of freedom are associated with this chi-square? 

(c) Do these data result m the rejection of the Poisson model at the a - 0.05 
significance level? ’ 


ADDITIONAL EXERCISES 


“2_ Let r, < Y 2 ^ < Y n be the order statistics of a random sample of 
s 肪 n from the distnbution having f( x ) ^ 2 x/e\ 0 < x < 0 ， zero 
elsewhere* * 

(a) If 0<c<h show that Pr (c < YJ$ < l) = 

(b) If « = 5 and if the observed value of Y n h 1.8, find a 99 percent 
confidence interval for B. 

t 

6.63, If 0.35, 0<92, 0.56, and 0,71 are the four observed values of a random 
sample from a distribution having pAS,f( x ； 0) ^$^-1 0 < 1 托 m 

elsewhere, find an estimate for 0. , ’ ? 


6.64. Let the table 


x 


Frequency 


0 


4 


6 10 14 J3 


6 


1 
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represent a summary of a random sample of size 50 from a Poisson 
distribution. Find the maximum likelihood estimate of Pr (X = 2), 

6.65* Let X be N(ji ， 100). To lest // 0 :" = 80 against /i > 80, let the 
critical region be defined byC= {(x h x 25 ) :x > S3}, where 3c is the 

sample mean of a random sample of size ft = 25 from this distribution, 

(a) How is the power function JC(p) defined for this test? 

(b) What is the significance level of this test? 

(c) What are the values of 尤 (80) ， K(83\ and K(86)? 

(d) Sketch the graph of the power function* 

(e) What is the value corresponding to x — 83.41? 

6.66, Let X equal the yield of alfalfa in tons per acre per year. Assume that 
A" is iV(l ， 5, 0.09). It is hoped that new fertilizer will increase the average 
yield. We shall test the null hypothesis H 0 : ft — L5 against the alternative 
hypothesis : fi > L5, Assume that the variance continues to equal 
er 2 = 0.09 with the new fertilizer. Using A", the mean of a random sample 
of size /i, as the test statistic^ reject // 0 if x > c. Find n and c so that the 
power function K(fi) -Pr(X> c : fi) is such that a ^ ^(1.5) = 0.05 and 
玢 1.7) = 0-95. 

6.67* A random sample of 100 observations from a Poisson distribution has 
a mean equal to 6,25. Construct an approximate 95 percent confidence 

interval for the mean of the distribution, 

!:• . . . - 

6*68. Say that a random sample of size 25 is taken from a binomial 
distribution with parameters n — 5 and p. These data are then lost, but we 
recall that the relative frequency of the value 5 was Under these 
conditions, how would you estimate pi Is this suggested estimate unbiased? 

6,69* When 100 tacks were thrown on a table, 60 of them landed point up. 
Obtain a 95 percent confidence interval for the probability that a tack of 
this type will land point up. Assume independence, 

6.70. Let Xi , X l3 .. ■ ， be a random sample of size /? = 8 from a Poisson 

distribution with mean fi. Reject the simple null hypothesis = 0.5 and 

8 

accept Hi: fi > 0,5 if the observed sum ^ jc,- > 8. 

* . /' — \ 

(a) Compute the significance level a of the test. 

(b) Find the power function K(n) of the test as a sum of Poisson 
probabilities, 

(c) Using the Appendix, determine 離 75) ， fC( I), and K(L25). 

6.71* Let p denote the probability that, for a particular tennis player, the 
first serve is good* Since p = 0-40，this player decided to take lessons in 
order to increase p. When the lessons are completed, the hypothesis 
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Ho:P = 0.40 will be tested against H x :p> 0.40 based on n - 25 trials. Let 

y ec l ual the number of first serves that are good, and let the critical region 
be defined by C^{y ： y> 13}. 

O) Determine a ^ Pr (K ^ 13，/? = 0.40). ' 

(b) Find ^ ^ Pr(F< 13) when /? = 0.60; that is ， j? = p r (7< 12; 

p = 0,60), 一 1 

A ♦ 

6_72, The mean birth weight in the United States is p = 3315 grams with a 
standard deviation of a - 575. Let X equal the birth weight in grams in 
Jerusalem. Assume that the distribution of X is N(fx, a 3 ). We shall test 
the rmll hypothesis H 0 :fi = 33l5 against the alternative hypothesis 
好！：只 < 3315 using a random sample of size n — 30. 

(a) Define a critical region that has a significance level of a = 0.05. 

(b) If the random sample of « = 30 yielded 又 = 3189 antU = 488, what is 

your conclusion? ’ ， 

(e) What is the approximate p-value of your test? 

6.73. Let F, < K 2 < - < V 5 be the order statistics of a random sample of 
size 5 from the distribution having p.d.f. f(x) = exp [—(jc - 0)fP\/p, 

0 <x< co, zero elsewhere. Discuss the construction of a 90 percent 
confidence interval for p if 6 is known. 

6*74. Three independent random samples，each of size 6, are drawn from three 
normal distributions having common unknown variance. We find the three 
sample variances to be 10, 14, and 8, respectively, 

(a) Compute an unbiased estimate of the common variance, 

(b) Determine a 90 percent confidence interval for the common variance. 

6.75. Let X!， ” 毛 be a random sample from NQi, a 2 ). 

(a) If the constant b is defined by the equation Pr (X < b) — 0.90, find the 
m.l.e, of b. 

(b) If c is given constant, find the m 丄 e. of Pr (X<c). 


6-76，Let X x ,X 2 , and X 3 and S 2 ^ 5^, and S] denote the means and the variances 
of three independent random samples, each of size 10, from a normal 
distribution with mean pi and variance cr 2 . Find the constant c so that 


Pr 



— 2尤3 

0S 2 { + 10SI+ 105^ 



6-77, Let Y be b(l92,p). We reject i/ 0 :/? = 0.75 and accept //, :p > 0,75 

if and only if Y > 152. Use the normal approximation to determine: 

<a) oc = Pr(7> 152;/? = 0.75)* 

(b) ^ = Pr (7 < 152) when p = 0.80, 

6.78. Let Ybe 6( 100, p). To test H 0 :p 0,08 against H { : p < 0-08, we reject 
仏 and accept //, if and only if r < 6* 
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(a) Determine the significance level a of the test. 

(b) Find the probability of the type II error if in fact p = 0*04. 

6*79. Let X u X 2f • • ■ ， be a random sample from a Bernoulli distribution 
with parameter pAfpis restricted so that we know that | < 1, find the 

of this parameter. 

6-80, Consider two Bernoulli distributions with unknown parameters /?j and 
Pi' respectively* If Y and Z equal the numbers of successes in two 
independent random samples, each of sample size n, from the respective 
distributions, determine the maximum likelihood estimators of/7] and p 2 if 
we know that 0 <p l <p 2 < 1. 

6.81* Let (X u K|), (X 2i Y 2 X … ， Y n ) be n Lud. pairs of random vari¬ 
ables, each with the bivariate normal distribution having five par- 
ameters fi ]y a], oj, and p, 

(a) Show that Z f = Xj — Y f is N(ji, a 2 )^ where ^ — and a 2 = a] — 

j 0*2 I =E l ， 2， « I * ^ ft* 

(b) Since all five parameters are unknown, /i and a 2 are unknown. To test 

= 0 (H 0 : fi t — /x 3 ) against //,: /i > 0 (H } : fi t > fi 2 ), construct a 
Mest based upon the mean and the variance of the n differences 
Z u Z 2y •.. ， Z n . This is often called a paired t-test. 




CHAPTER 


Sufficient Statistics 

辱 


7A Measures of Quality of Estimators 

In Chapter 6 we presented some procedures for finding point 
estimates，interval estimates, and tests of statistical hypotheses. In this 
and the next two chapters, we provide reasons why certain statistics are 
used in these various statistical infereinces- We begin by considering 
desirable properties of a point estlmat 變断 

Now it would seem that if y = u(x { t x 29 ^ y x n ) is to qualify as a 
good point estimate of 6 y there should be a great probability that the 
statistic Y = u{X u JT 2 , • *,, X n ) will be close to 0; that is, 0 should be 
a sort of rallying point for the numbers y - u(x } ， x 2 ”.” jc n ) ■ This can 
be achieved in one way by selecting Y = u{X u X 2 , • . • ， JQ in such a 
way that not only is Y an unbiased estimator of 0， but also the variance 
of Y is as small as it can be made. We do this because the variance of i 
y is a measure of the intensity of the concentration of the probability 
for Y in the neighborhood of the point 9 = Accordingly^ we 

define an unbiased minimum variance estimator of the parameter B in 
the following manner. 

Definition 1. For a given positive integer n f Y — u{X u X 2 , _, X n ) 

will be called an unbiased minimum variance estimator of the par* 
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ameter 0 if Y is unbiased，that is, E(Y) = d, and if the variance of Y 
is less than or equal to the variance of every other unbiased estimator 

For illustration, let X u X 2 ^ …，， A denote a random sample from 
a distribution that is N(6, I), — oo <0 < oo. Since the statistic 
X = (不 + + • … + X g )!9 is N(0, 0, X is an unbiased estimator of 

a The statistic X { is N(6, I), so X x is also an unbiased estimator of 
6- Although the variance | of is less than the variance 1 of Xj, we 
cannot say，with ^ — 9, that X is the unbiased minimum variance 
estimator of 6; that definition requires that the comparison be made 
with every unbiased estimator of B, To be sure, it is quite impossible 
to tabulate all other unbiased estimators of this parameter 0， so other 
methods must be developed for making the comparisons of the 
variances. A beginning on this problem will be made in this chapter. 
Let us now discuss the problem of point estimation of a parameter 
from a slightly different standpoint. Let X u X 2 ^ *.., X n denote a 
random sample of size n from a distribution that has the pAS.f(x; 9), 
Be SI. The distribution may be either of the continuous or the discrete 
type. Let F= u{X { ， & ， … ， A；) be a statistic on which we wish to base 
a point estimate of the parameter 6, Let S(y) be that function of the 
observed value of the statistic Y which is the point estimate of 6, Thus 
the function 3 decides the value of our point estimate of 0 and <5 is called 
a decision function or a decision rule. One value of the decision function, 
say S(y), is called a decision. Thus a numerically determined point 
estimate of a parameter ^ is a decision. Now a decision may be correct 
or it may be wrong. It would be useful to have a measure of the 
seriousness of the difference, if any, between the true value of 0 and the 
point estimate S(y). Accordingly, with each pair ， [0 ， S(y)] 9 0 eQ y we 
will associate a nonnegative number ^[6, d(y)] that reflects this 
seriousness* We call the function the loss function. The expected 
(mean) value of the loss function is called the risk function. If g(y ； 0) y 
$ e Q, is the p,dX of Y, the risk function R(6 7 5) is given by 

啤 00 

R(6, S) = E{^[d, ^y)]} - ； se[0, S(y)]g(y; 6) dy 

— co 

if F is a random variable of the continuous type. It would be desirable 
to select a decision function that minimizes the risk R(9^ 5) for all values 
of 0， 0 e a But this is usually impossible because the decision function 
S that minimizes R(0 y S) for one value of 0 may not minimize 
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R(0, 3) for another value of 6. Accordingly, we need either to restrict 
our decision function to a certain class or to consider methods of 
ordering the risk functions. The following example, while very simple, 

dramatizes these difficulties, 

# 

Example L Let X 2 ，…， X 2 s be a random sample from a distribution 
that is N(8, V )， — oo < 0 < oo. Let Y ^ X, the mean of the random sample， 
and let S(y)] = [0 - S(y)] 2 , We shall compare the two decision functions 
given by § t (y) = y and S 2 (y) — 0 for —qo < ^ < oo* The corresponding risk 
functions are 

and 

rn 4 ) = E[(d ^ 0 ) 2 ] - e\ 

M- J 

Obviously，if，in fact， 0 = 0, then d 2 (y) = 0 is an excellent decision and we have 
/f(0, 3 2 ) = 0. However, if 0 differs from zero by very much，it is equally 
clear that 5 2 ( 少） = 0 is a poor decision. For example, if, in fact, 0 = 2, 
及 (2, S 2 ) — 4> R(2 3 In general, we see that R(8, d 2 ) < R(8, <5 ( ), 

provided that 一! < 0 < ! and that otherwise R(d y 3 2 ) > R(9, <5j)- That is，one 
of these decision functions is better than the other for some values of 8 and 
the other decision functions are better for other values of d. If, however, we 
had restricted our consideration to decision functions <5 such that E[d( F)] — 0 
for ali values of 8,8 eQ y then the decision d 2 (y) = 0 is not allowed. Under this 
restriction and with the given 文[仏 <5(y)]，the risk function is the variance of 
the unbiased estimator <5( K), and we are confronted with the problem of 
finding the unbiased minimum variance estimator. Later in this chapter we 
show that the solution is 5(y) =y = Z 

Suppose, however, that we do not want to restrict ourselves to decision 
functions <5, such that E[S( Y)] = 0 for all values of 6 eQ. Instead, let us 
say that the decision function that minimizes the maximum of the risk 
function is the best decision function. Because, in this example， R(0 f 6 2 ) = & 1 
is unbounded, S 2 (y) — 0 is not, in accordance with this criterion, a good 
decision function. On the other hand, with 一 oo < 0 < oo, we have 

max R(0 $ <5j) = max (^) = ^ 

Accordingly， S^y) = y =：x seems to be a very good decision in accordance 
with this criterion because 去 is small, Asa matter of fact, it can be proved that 
is the best decision function, as measured by the minimax criterion^ when 
the loss function is <5(y)] = (0 — <5(j)] 2 . 

In this example we illustrated the following: 

綠 ■ ^ 9 

1. Without some restriction on the decision function, it is difficult to 
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find a decision function that has a risk function which is uniformly 
less than the risk function of another decision function ， 

2， A principle of selecting a best decision function, called the minimax 
principle. This principle may be stated as follows: If the decision 
function given by S Q (y) is such that, for all 6 e 

max R[0, S Q (y)] <, max R[0,8(y)] 

0 ■- B 

for every other decision function S(y) 7 then S^(y) is ca^leti a minimax 
decision function. 


With the restriction E[8( Y)] = 9 and the loss function 
^(y)] = [^ ~ the decision function that minimizes the risk 
function yields an unbiased estimator with minimum variance* If, 
however，the restriction E[5( Y)] = 0 is replaced by some other 
condition, the decision function S(Y)^ if it exists, which minimizes 
E{[8 - 3(Y)] 2 } uniformly in 6 is sometimes called the minimum 
mean-square-error estimator. Exercises 7.6, 7,7， and 7.8 provide 
examples of this type of estimator. 

There are two additional observations about decision rules and loss 
functions that should be made at this point. First, since Fisa statistic, 
the decision rule 5(Y) is also a statistic, and we could have started 
directly with a decision rule based on the observations in a random 
sample, say 4(1, ， X 2 ” •” X n ). The risk function is then given by 


m - E{^[6, S { (X U 名 ， • •. ， X n )]} 


邓， x 2 


A)] 


x fix 、； • *f(x n ;0) dx x -dx n 


if the random sample arises from a continuous-type distribution. We 
did not do this because, as you will see in this chapter, it is rather easy 
to find a good statistic, say Y, upon which to base all of the statistical 
inferences associated with a particular model* Thus we thought it more 
appropriate 一 to start with a statistic that would be familiar, like the 
m-I.e. Y X in Example 1. The second decision rule of that example 
could be written S 2 (Xi , X 2 ^ … ， X n ) = 0, a constant no matter what 
values of X 2t ,. are observed. 

The second observation is that we have only used one loss 
function, namely the square-error loss function S) — (6 — 
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The absolute-error loss function <5) — \9 — is another popular 
one. The loss function defined by 


S) = Oj |0 — <5| < u^ 

<r « i *" 寧 . _ 

A; |0 s\ 


where a and b are positive constants，is sometimes referred to as the 
goalpost los^functidni The reason for this terminology is that football 
fans recognize it is like kicking a field goal: There is no loss (actually 
a three-point gain) if within a units of the middle but h units of loss 
(zero points awarded) if outside that restriction. In addition, loss 
functions can be asymmetric as well as symmetric as the three previous 
ones have been. That is，for example, it might be more costly to 
underestimate the value of 6 than to overestimate it, (Many of us think 
about this type of loss function when estimating the time it takes us 
to reach an airport to catch a plane,) Some of these loss functions are 
considered when studying Bayesian estimates in Chapter 8, 

Let us close this section with an interesting illustration that raises 
a question leading to the likelihood principle which many statisticians 
believe is a quality characteristic that estimators should enjoy，Suppose 
that two statisticians^ A and 5， observe 10 independent trials of a 
random experiment ending in success or failure. Let the probability of 
success on each trial be 6, where 0 < 0 < L Let us say that each 
statistician observes one success in these 10 trials. Suppose ， however, 

that A had decided to take n — 10 such observations in advance and 

* 

found only one successyWiiile B had decided to take as many 
observations as needed to get the first success, which happened on the 
10th trial. The model of A is that = 10, 6) andjK = 1 is observed. 

On the other haod, B is considering the random variable Z that has 
a geometric p,dX g(z) = (1 — 6f~ ] 0, z — 1 ， 2, 3, ■…， and z = 10 is 
observed. In either case，the relative frequency of success is 

z — i = 丄 

«z == To , 

which could be used as an estimate of 0. 

Let us observe, however, that one of the corresponding estimators ， 
Yjn and I/Z, is biased. We have 







312 Sufficient Statistics (Oi* 7 

while 

—0 + 1(1 一 ff)6 + 士 (I — 6) 2 0 + • • ， > 0* 

That is，I /Z is a biased estimator while F/IOis unbiased. Thus A is using 
an unbiased estimator while B is not Should we adjust estimator 
so that it too is unbiased? … 

It is interesting to note that if we maximize the two respective 
likelihood functions, namely 

no _ , 


and 

l 2 (0) = o ~er~% : 

with « = 10, j = 1, and z = 10, we get exactly the same answer, 沒 = 忐 . 
This must be the case, because in each situation we are maximizing 
(1 一 &yd. Many statisticians believe that this is the way it should be 
and accordingly adopt the likelihood principle: 

Suppose two different sets ofdata from possibly two different random 
experiments lead to respective likelihood ratios ， 1^(0) and L 2 (6), that are 
proportional to each other• These two data sets provide the same 
information about the parameter 6 and a statistician should obtain the 
same estimate of 6 from either. 、 

In our special illustration, we note that and the 

likelihood principle states that statisticians A and B should make the 
same inference. Thus believers in the likelihood principle would not 
adjust the second estimator to make it unbiased. 

EXERCISES 

7*1 - Show that the mean X of a random sample of size n from a distribution 
having p.d.f. f(x; 6) = 議 ， 0 < x < oo, 0 < 0 < oo, zero elsewhere, 

is an unbiased estimator of 8 and has variance 6 2 /n t 

7.2. Let X 2 ,..., X n denote a random sample from a normal distribution 
* n 

with mean zero and variance 8,0 < 0 < oo. Show that [ X 2 Jn is an unbiased 

estimator of 6 and has variance 26 2 /n. 1 


乙】(0) = (?) 阶 (i -设) 
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73. Let Y x <Y 2 < K 3 be the order statistics of a random sample of size 3 from 
the uniform distribution havingp,dX/(x; 9) — 1/6,0 < x < 9,0 < 6 < oo, 
zero elsewhere. Show that 4K,, 2Y 2 , and 17 3 are all unbiased estimators of 
9. Find the variance of each of these unbiased estimators. 

7.4. Let Fj and Y 2 be two independent unbiased estimators of 0. Say the 
variance of K, is twice the variance of Y 2 , Find the constants k { and k 2 so 
that Y x + k 2 Y 2 is an unbiased estimator with smallest possible variance 
for such a linear combination. 

7*5. In Example I of this section，take ^[6 7 S(y)] — \0 — Show that 
R(9, 5,) — ly/2/n and R(0 f S 2 ) — |6|* Of these two decision functions S { and 
S 2y which yields the smaller maximum risk? 

7-6, Let X u X 2 y …， denote a random sample from a Poisson distribution 

with parameter Q, 0 < 0 < oo. Let and let S(y)] — 

r 

[6 — <5(y) | 2 . If we restrict our considerations to decision functions of the 

form 3(y) = b + y/n, where h does not depend upon y t show that 

R(9, 3) = b 2 + Qjn, What decision function of this form yields a uniformly 

smaller risk than every other decision function of this form? With this 

solution, say <5, and 0 < 0 < oo, determine max it(0, d) if it exists. 

& 

7.7. Let X x ,, X„ denote a random sample from a distribution that is 

» 一 

N(ji, 6),0 < 9 < oo, where fi is unkiiown. Let F [ (不 一 X) 2 /n = S 2 and 

I 

let if[0, S(y)] = [6 — ^(j)] 2 . If we consider decision functions of the form 

<500 = by ，where h does not depend upon y, show that R(6, S) — (6 2 / 

n 2 )[(n 2 — \)b 2 — 2n{n — l)b + n 2 ]. Show that b = nj{n + 1) yields a 

minimum risk for decision functions of this form. Note that n Yj{n + I) is 

not an unbiased estimator of 6. With d(y) — nyj{n + 1) and 0 < 0 < oo 3 

determine max R(6,5) if it exists- ^ , 

0 

7.8, Let X U X 2 , ^ ^X n denote a random sample from a distribution that is 
b(U9) f 0<e< l.Let Y=f j X i and let 观 _] = [0- 3(y)]\ Consider 

i 

decision functions of the form 5(y) — by ，where b does not depend upon y. 
Prove that J?(0, S) = b 2 n0(l — 0) + (bn _ I) 2 0 2 , Show that 

• V 

max R(0, S) = —:- - - 

e J 4[b^n ^ (bn - I) 2 ] 

provided that the value b is such that b 2 n > 2(6/r— l) 2 . Prove that b ^ \jn 

does not minimize max R(8 9 5). 

o 
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7.9* Let ^ ， X 2 ” “ ， be a random sample from a Poisson distribution with 
mean 9 > 0. * 

(a) Statistician A observes the sample to be the values x, 5 x 2 ” • • 5 with 
sum y x — S Xi ，Find the m,Le, of 9. 

(b) Statistician B loses the sample values x, ， x 2 , … T but remembers the 
sum y } and the fact that the sample arose from a Poisson distribution. 
Thus B decides to create some fake observations which he calls 
z u … ，心 （as he knows they will probably not equal the original 
x-values) as follows. He notes that the conditional probability of 
independent Poisson random variables Z i? Z 2s . .., Z n being equal to 
z u z 2j .... given 1. Zi = y u is 

护 'e* e 驴 2 e 4 qz^b ( ^ 

… i ^ 私 (1 丫丫 1 丫 2 … (If 

— — (ndy^e-^ _ Z|! z 2 ! * - z n \ \n) [nj 

Jh! 

since Y x = L Z f has a Poisson distribution with mean nQ. The latter 
distribution is multinomial with independent trials, each terminating 
in one of n mutually exclusive and exhaustive ways，each of which has 
,the same probability \jn. Accordingly, B runs such a multinomial 
experiment independent trials and obtains z h z Zi … Find the 
likelihood function using these z-values. Is it proportional to that of 
statistician A? ~ 

Hint: Here the likelihood function is the product of this conditional 
p.d.f. and the pAS. of Fj = 2 Z t , 

B I U + \ 

% 

7,2 A Sufficient Statistic for a Parameter 

Suppose that X u X 2 ,..,, X n is a random sample from a dis¬ 
tribution that has p.d.f, f(x; 0), 0 e Q* In Chapter 6 and Section 7J 
we constructed statistics to make statistical inferences as illustrated by 
point and interval estimation and tests of statistical hypotheses* We 
note that a statistic, say Y = u(X u 尤 2 ,… ， X n ), is a form of data 
reduction. For illustration，instead of listing all of the individual 
observations X u X 2 ,. ■ . ， we might prefer to give only the sample 
mean X or the sample variance S 2 . Thus statisticians look for ways of 
reducing a set of data so that these data can be more easily understood 
without losing the meaning associated with the entire set of 
observations. 

It is interesting to note that a statistic Y — u(X u X 2 , * •. ， JQ really 
partitions the sample space of A^ ， X 2 , • ■., AV. For illustration, 
suppose we say that the sample was observed and x = 8.32- There are 
many points in the sample space which have that same mean of 8,32, 
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and we can consider them as belonging to the set {(x u x 2 ,: 
x = 8.32}* As a matter of fact，all points on the hyperplane 

A + x 2 + * ♦ - + = (S32)n / 

* a ： 

yield the mean of 3c - 8.32, so^this hyperplane is that set. However, 
there are many values that X can take and thus there are many 
such sets. So，in this sense, the sample mean X ~or any statistic 
Y = u(X } , X 2y •,, ， X n ) — partitions the sample space into a collection 
of sets. 

Often in the study of statistics the parameter 0 of the model is 
unknown; thus we desire to make some statistical inference about it* In 
this section we consider a statistic denoted by Fj — u } (X u •. ■ ， X n \ 
which we call a sufficient statistic and which we find is good for making 
those inferences. This sufficient statistic partitions the sample space in 
such a way that, given 

(JT, ， Z 2 ,. • • ， X n )€ {(a ， x 2 , " • ， x n ): Mi(X u X 2 , ■ • • ， x„) = 

• ，•參 • ’， - * ^ 

the conditional probability ofX u X 2 ^..., X„ does not depend upon 9, 
Intuitively, this means that once the set determined by Y x ^ y x is fixed, 
the distribution of another statistic，say Y 2 = u 2 (X x s X 2 , •… ， X a ), does 
not depend upon the parameter 8 because the conditional distribution 
of X u X 2f ■…，尤 does not depend upon 0. Hence it is impossible to 
use given F r = to make a statistical inference about 0. So, in a 
sense, Y } exhausts all the information about 0 that is contained in the 
sample. This is why we call Yi = u } (X u X 2f - .., X n ) a sufficient 
statistic. 

To understand dearly the defirtition of a sufficient statistic for a 
parameter 0， we start with an illustration* 

Example L Let X U X 2 , … ， X n denote a random sample from the 
distribution that has pAL 

fix ； e) = ^(i -ey-\ x = 0, i; o<e<i ； 

= 0 elsewhere. 

a ■ 货 * 

■ * 1 •» * 

The statistic Y x = X t + X 2 + ^ ^ + X n has the p-dX 

?,(/, ； 0>= ( 二 yv-&) n - yi ，, 

= 0 elsewhere. 

What is the conditional probability 

Pr (X| — pC|, X 2 = X 2 i.., r X„ — Yi — = P(v4|5), 
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say, where y x = 0,1 ， 2, … ， n? Unless the sum of the integers x l 
(each of which equals zero or 1) is equal to y x , the conditional probability 
obviously equals zero because A n B = 0. But in the case^ r = Z x h we have 
that J c ； B so that A n B — A and P{A\B) = P(A)jP{B)\ thus the conditional 
probability equals 

e X[ (i - ey~ x w xi (i -ey ~ X2 - ■，听 p ^ey- x ^ _ o ZXt (i - oy - ZXi 





Since y { = X\ + x 2 + + x rt equals the number of Vs in the n inde¬ 

pendent trials, this conditional probability is the probability of selecting 
a particular arrangement of y { Vs and (/i — zeros，Note that this 
conditional probability does not depend upon the value of the parameter 0. 


In general, let gi(y } ; 0) be the p.dX of the statistic Y { = 
Ux{X u X 2 , … ， X n ), where …， is a random sample arising 

from a distribution of the discrete type having p.d.f./( jc; 0) y OeQ. The 
conditional probability of = x y ^ X 2 = x 2y X n = x”, given 
Y l =y u equals 


f(x l ； e)/(x 2 ； d)--f(x n ； 0) 


gi[u ] (x l ,x 2 ,.. - , x n ); 6] 


provided that x u x 2 , 々 are such that the fixed = 
Wi(a ： i ， 文 2 » … ， x n ), and equals zero otherwise* We say that 
Y } = U\(X { ,X 2 ,X n ) is a sufficient statistic for 9 if and only if this 
ratio does not depend upon 6, While, with distributions of the 
continuous type, we cannot use the same argument, we do, in this case, 
accept the fact that if this ratio does not depend upon 0, then the 
conditional distribution of X [3 X 2> . -,, X ny given Y l =y { , does not 
depend upon 6. Thus^n both cases, we use the same definition of a 
sufficient statistic for 0: 一 

P^nition 2, Let X u X 2 ^ .. denote a random sample 
of size n from a distribution that has p.d.f. f(x;&), 6eQ. Let 
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Y] — Ui(X u X 2 , _, X n ) be a statistic whose p.d.f* 0)* Then Y l 

is a sufficient statistic for 8 if and only if 


/(^]； 0)/(^2； 0 ) '' • f ( X ; 6 ) 

gi[u\(x u x 2 ^. ” x„); 0l 




H(x u x 2 … ， , x n y 


where H(x u x 2y ..， s x n ) does not depend lipon 8 eQ, 

Remark* In most cases in this book, X u , X n do represent the 

observations of a random sample; that is, they are i-Ld. It is not necessary, 
however, in more general situations, that these random variables be 
independent; as a matter of fact，they do not need to be identically distributed. 
Thus, more generaily, the definition of sufficiency of a statistic 

X ly ， " ， Z") would be extended to read that 


/( 々 , 々， …， x n ; 0) 

x 2 , … ,x w ); 9] 


H{x u x 2 , 




does not depend upon 9 gQ, where /( 々， jc 2 , ,.., x n i 6) is the joint p_dX 
of X u X l7 .…， There are even a few situations in which we need an 
extension like this one in this book. 

We now give two examples that are illustrative of the definition. 

I ’下 • 

Example 2. Let X u X 2 ^ …， be a random sample from a gamma 
distribution with oc — 2 and p = 6 > 0. Since the iiLg.f. associated with this 


A 


distribution is M(t) = (I — &ty\ t < 1/0, the m.g.f. of Y } = ^ ^ is 


£[e’ … + 々 + ■ + = E(e tri )E(e tX2 ) * - - E(e lXn ) 




[(1 - 6t)" 2 ] n ^ (1 - dty 2 \ 


Thus ( has a gamma distribution with a = 2n and P — 9, so that its p.d.f. is 


giiyu 0 ) 




T{2n)Q 2n 


y^~ l e^ y]ie , 0 < y } < co 


0 elsewhere. 


Thus we have that the ratio in Definition 2 equals 




r(2)d 


2 




- \^-x 2 f& 


r(2)d 2 






T{2)6 2 


r(2«) 


x 而 … 


(x t + JC 2 + … + x.f 1 - 一 w + 祀 + … + x » 、增 [r( 2 )f (x ] + x 2 

— r(2/i)^ 


x n T 
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where 0 < x f < oo, / — U2,... Since this ratio does not depend upon 仏 
the sum K is a sufficient statistic for 見 

Example 3, Let Y x < Y 2 < ― < Y n denote the order statistics of a 
random sample of size n from the distribution with p-d.f. 

% 

Here we use the indicator function of set A defined by 

4 * 

^a(^) ^ If x € 

= 0 ， x 牵 A.- 

This means，of course, that f(x; 6) — e" (x " e \ 0 < jc < oo, and zero elsewhere. 
The p.df, of 5V= min (A^) is 

Thus we have that 

总 ”㈣ ⑹ e -x 卜 x 2 ——’‘ 

_ 十— - —一 ^ - -- , 

ne-^^- x f ) ~ ne^ n — 〜 

^ . 

since Yl ^e^)( x d ~ Vo^min xX because when 6 < min x h then B < x ti 

-y. l * ' ' 

蠡 * 

i = 1 ， 2, “ . ， and at least one jc- value is less than or equal to 9 when 
min x, ^ 0. Since this ratio does not depend upon 0， the first order statistic 
Y x is a sufficient statistic for 9, 

If we are to show, by means of the definition, that a certain 
statistic Yi is or is not a sufficient statistic for a parameter B y we must 
first of all know the p.d-f- of say 幻 ; 0). In some instances it may 
be quite tedious to find this p d.f. Fortunately, this problem can be 
avoided if we will but prove the following factorization theorem of 
Neyman. 

Theorem 1. Let X 2y • • •, 尤 denote a random sample from 
a distribution that has pAS. f(x; 6)^ 0eQ- The statistic Y x = 
X 2 ,, * -, X n ) is a suffici^rit statistic for 9 if and only if we can find 
two nonnegative functions ， k x and k 2 , such that 

f(xuO)f(x 2 ； 0)---f(x n ； e) 

^ ktlUyiXi ，文 2 , . . t fl ); 0}k 2 (x { ， 义 2, • _ • ， 

* r « 

where k 2 (xi, x 2 ,，" ， does not depend upon 8. 

Proof. We shall prove the theorem when the random variables 
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are of the continuous type. Assume that the factorization is as 
stated in the theorem. In our proof we shall make the one-to-one 
transformation y { = u t (x { , •… ， xj ， y 2 = u 2 {x u jc„) ， … S, y n = 
u n {x u x n ) having the inverse functions x, = (y } , y n ), 

^2 = 印 ，，■” yn\ * * * ? ^ 阶 ”( 少 1， * ■ * ，少 《) and JscobiQn J* The 

joint p-dX of the statistics F [5 Y 2 ^ ^ ,Y n is then given by 

《 ( 乃，少 2, ^^y n \ 0 ) = k x {y x ; 0)*2(. 沙 2,… ， w n )\A 

where w { = mOv 少 2 , … ,y n \ i= 1 ， 2 , … ■，凡 The p.d.f, of Y u say 
gi(yt ； 8) f is given by 

♦ t fc f - m 4 * 

f*00 广 00 

(yi l ❷） = … 贫 Ov ， 少 2,… ， 少 /j; 沒 ) dy 2 '- dy n 

^ 遲 • 00 w — 00 


= ^iOi；0 ' … I 砸 2 — 1 , 州 2 , 办广 • dy n . 

*^—00 * — 00 

* J . 

Now the function k 2 does not depend upon 8, Nor is 6 involved in 
either the Jacobian J or the limits of integration. Hence the (n — 1)- 
fold integral in the right-hand member of the preceding equation is 

a function of y } alone, say Thus 

* * ■ 

! gi(y \； G) = ktiy^ 

fe jfr 

If m(y t ) — 0, then gi(y } ; 0) ― 0. If m(y t ) > 0, we can write 


^i[ut{x h 

* * * 9 心 ) ； 0] 




幻[叫 (X! ， . ， . ，〜 );0J 


w[«i (X| 


A)] 


and the assumed factorization becomes 


JW ， G) • • •/( 〜； 沒） = g)W\(x u . 


昊 ) ； 0 】 


w[Wj (Xj ，… * ， X n )J 


* 


Since neither the function k 2 nor the function m depends upon 0， then 

in accordance with the definition, Y { is a sufficient statistic for the 

■ 

parameter 6. 

Conversely, if Y { is a sufficient statistic for 6, the factorization can 
be realized by taking the function to be the pAS. of Y { , namely the 
function g 卜 This completes the proof of the theorem. 


Example 4 Let X u X 2 , - , 尤 denote a random sample from a distri 
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bution that is N(9, a 2 )， — oo < 0 < oo, where the variance a 2 > 0 is known. 

n . 

Ux — Y, 叉加 、 then 

i 


t t K^-3E) + (3F-0)F- X (x t - xf + n(x ^ 9f 

/ = i f»i ’ t — i 

because 

2 £ (Xi -x}(x -6) = 2(x - 0) £ ( Xi -x)^ 0 . 

i — i i — l 

Thus the joint p,d.f. of X U X 2 ^ … ， X n may be written 


n 





exp 


n 


- Z ( x i & Gf/la 2 






(exp [—«(x — QYjlu 2 ]} 


r 

exp 

- fA x r xf/2<r 2 I 
L 」} 

l 



Since the first factor of the right-hand member of this equation depends upon 
x u x 2 … ，， x n only through x, and since the second factor does not depend 
upon 0, the factorization theorem implies that the mean X of the sample is, 
for any particular value of tr 2 , a sufficient statistic for 0, the mean of the normal 
distribution. 


We could have used the definition in the preceding example because 
we know that X is Nifi ， (y 2 jn). Let us now consider an example m which 
the use of the definition is inappropriate. 

Example 5. Let X x ^ X 2 , …， JC n denote a random sample from a distri¬ 
bution with p.d.f. 

f(x; 6) — 8 jc ° 乂 0 < x < I, 

= 0 elsewhere ， 


where 0 < 0. We shall use the factorization theorem to prove that the product 
U)(X l ， Z 2 , … ， AT") = X\X 2 * * • isa sufficient statistic for 0. The joint p.dX 

5 "^^2 ， . * . s IS 


沪 (XA … x n f - ! ^ [ 0 n (x l x 2 * - - x n f] 


1 


… A 



ere 1 0 < Xi < 1, / = 1 ， 2” .， ， m In the factorization theorem let 
J fcdu } (x u 0 ] = ^1x^X2 * ^ - x n f 
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and 


々2( 文 t ， 义2, . " ， A ) 




文1 文 2 … A 

Since k 2 {x u x 2 , ■ ，，， x ft ) does not depend upon 0 y the product X^Xi 
a sufficient statistic for 6. 


X„ is 


There is a tendency for some readers to apply incorrectly the 
factorization theorem in those instances in which the domain of 
positive probability density depends upon the parameter 0, This is due 
to the fact that they do not give proper consideration to the domain 

of the function k 2 (x iy x 2 ^ … - ， x n ) - This will be illustrated in the next 
example* 

Example 6. In Example 3 with f(x; 6) = e^ x -% Sa0) (xl it was found 

that the first order statistic Kj is a sufficient statistic for 0. To iiiustrate our 

point about not considering the domain of the function，take « = 3 and note 
that 


e ~{Xi-6) e -iX 2 -e) e _ ^-3Tnax^+3^j+ 3 max JT,| 

or a similar expression. Certainly, in the latter formula, there is no 9 in the 


second factor and it might be assumed that r 3 = max X, is a sufficient 
statistic for B. Of course, this is incorrect because we should have written the 


joint p.dX of X 2 , as 


%刮 ⑹] - %圳⑽ 

=(min xMe~ x ^ x ^ x ^] 

because I ie ^ } (min similar statement cannot 

be made with max Thus Y t — min Xi is the sufficient statistic for 6, not 
Y } = max X 卜 


EXERCISES 

7.10, Let Xjt »* *, be a random sample from the normal distribution 

#(0, 8) r 0<6<oo t Show that [ is a sufficient statistic for 8. 

* ^ 

7.11, Prove that the sum of the observations of a random sample of size 乃 

from a Poisson distribution having parameter 0, 0 < 6 < oo, is a sufficient 
statistic for 0. 

7*12- Show that the wth order statistic of a random sample of size n from the 
uniform distribution having p_d.f. f(x; 6) = 1/6, 0 < x < 0, 0 < 0 < oo, 
zero elsewhere, is a sufficient statistic for 0. Generalize this result by 
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considering the pAS. f(x; 6) ^ Q(9)M(x), 0<x <9, 0<6 < oo, zero 
elsewhere. Here, of course, 


f*e 

M(x) dx = 


綱 


■ 


7.13, Let X u X 2 ^ * ^ X„ be a random sample of size n from a geometric 
distribution that has p-d-f f(x; 6) = (1 — 0)^, x = 0,1 ， 2, •. • ， 0 < 0 < 1 ， 

n 

zero elsewhere. Show that [尤 is a sufficient statistic for 6. 

] 

7.14. Show that the sum of the observations of a random sample of size n 
from a gamma distribution that has p,d ， f, f(x; 0) — (l/9)e~ xf6 ^ 0 < x < co, 
0 < 0 < oo, zero elsewhere, is a sufficient statistic for 6 . 

7-15. Let X u X 2i ... ,X n bc a random sample of sizen from a beta distribution 
with parameters oc — 0 >0 and 芦 = 2 ‘ Show that the product X\X 2 m m 9 X n 
is a sufficient statistic for 6, 

7*16. Show that the product of the sample observations is a sufficient statistic 
for 0 > 0 if the random sample is taken from a gamma distribution with 
parameters 9 and ^ = 6. 

I 

7.17 - What is the sufficient statistic for 9 if the sample arises from a beta 
distribution in which a = ^ ^ > 0? 

眷 專 

7.3 Properties of a Sufficient Statistic 

Suppose that a random sample % X 2 ,. -. T X n is taken from a 
distribution with p.d.f. f(x; 0) that depends upon one parameter Be Cl, 
Say that a sufficient statistic Y x = u { (Xi, X 2 ^ ^ , X n ) for 6 exists and 
has p.dX gt(y \； 6). Now consider two statisticians, A and B. The first 
statistician, has all of the observed data , x n ; but the 

second, has only the value 少！ of the sufficient statistic* Clearly, A has 
as much information as does B. However, it turns out that B is as well 
off as A in making sta tistical inferences about 6 in the following sense. 
Since the conditional probability of X u X 2 ^ ^ X n , given F, = y u 
does not depend upon 6, statistician B can create some pseudo 
observations，say Z u Z 2 , … ， Z n , that provide a likelihood function 
that is proportional to that based on X U X 2 ,X n with the factor 
giiyil 0) being common to each likelihood. The other factors of the two 
likelihood functions do not depend upon 0. Hence, in either case, 
inferences，like the m.l.e. of would be based upon the sufficient 
statistic Y x , ; 

To make this clear，we provide two illustrations. The first is based 




Sec- 7.3] Properties of a Sufficient Statistic 


323 


upon Example 1 of Section 12. There the ratio of the likelihood 
function and the pAS. of Y x is 


m 


gxiyuO) 


n 



where y* = X x i- Recall that each x f is equal to zero or 1, and thus y } 

/** i 

is the sum of y x ones and (n — 少 i) zeros. Say we know only the value 
and not x u x 2 , … so we create pseudo values z 】 ， z 2 , •. - ， z n by 
arranging at random ones and (n-y x ) zeros so that the probability 


of each arrangement is p 



Thus the probability that these 


z-values equal the original ^-vafiies is /?, and hence it is highly 
unlikely, namely with probability /?， that those two sets of values 

would be equal. Yet the two likelihood functions are proportional, 
namely 







oc 




n 


0yt(i-gf~yt 


n 

14 


because y x — Oearly s the m.Le. of 0, using either ex- 

i'= 1 i= I 

pression, is yjn. - 

The next illustration refers back to Exercise 7<9* There the sample 
arose from a Poisson distribution with parameter 0>O. It turns 

ft 

out that y, ^ is a sufficient statistic for 6 (see Exercise 7.11), In 

i=i 

Exercise 7.9 we found that 


m 


少 I! 





er 


when _ is the likelihood function based upon x t ， x 2 , …， Since 
this is a multinomial distribution that does not depend upon 0, we can 
generate some values of 乙， Z 2 , " • ，乙 ， say z, ， z 2 ,... ，心 ， that have 
this multinomial distribution ； It is interesting to note that while in the 
previous examples the z-values provided an arrangement of the 
values，here the z-values do not need to equal those jc- values. That 
is，the values z u z 2 , •…， do not necessarily provide an arrangement 
of ： Xi y . - . ，〜， It is, however，true that 2 z, = S Of course ， 
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from the way the z-values were obtained, the two likelihood functions 
enjoy the property of being proportional, namely 




y\ 




^g\{yu&) 


y\ 


Z\ \ z 2 ： 


f. 


z n ! 




Thus，for illustration, using either of these likelihood functions, the 
m.l.e, of 0is yijn because this is the value of 0 that maximizes g, (/ ( ; 6). 

Since we have considered how the statistician knowing only the 
value of the sufficient statistic can create a sample that satisfies the 
likelihood principle; and thus, in this sense，she is as well off as 
the statistician that has knowledge of all the data. So let us now state 
a fairly obvious theorem that relates the nx.I_e. of 0 to a sufficient 
statistic. 


Theorem 2. Let X }J X 2 , ... y denote a random sample from a 
distribution that has p.df. f(x; 9), OeSi, If a sufficient statistic 
Y\ — U\{X u X 2 ^ … ， X n } for Q exists and if a maximum likelihood 
estimator 6 of 6 also exists uniquely，then § is a function of 

~ X 2y * -., X n ) m 

■ 

Proof, Let g x (y } ; 0) be the p.dX of Y x . Then by the definition of 
sufficiency, the likelihood function 

，…， \) - f{x x ; 0)f(x 2 ; d) s - ^/{x n ; 6) 

, ~ Si\M\ (-^ i ? • • - ， x n ); f 

where •.. ， x n ) does not depend upon 0. Thus L and g u as 
functions of Q, are maximized simultaneously. Since there is one and 
only one value of 0 that maximizes L and hence gi[ui(x u …， x„); 9], 

that value of 0 must be a function of u t (jc H x 2 , _, x n ). Thus the mJ.e, 

^ is a function of the sufficient statistic F_ = i<_ (J^ ， X 2 ,, • • ， X„). 

Let us consider another important property possessed by a 
sufficient statistic Y x = u x (X u Y 2 , •…， X n ) for 9. The conditional p.d.f, 
of a second statistic, say Y 2 = u 2 {X x , X 2 , ■… ， XJ, given K_ = 少 ,， does 
not depend upon 6. On intuitive grounds, we might surmise that the 
conditional p.d.f. of F 2 , given some linear function aY x + a # 0 ， 
of does not depend upon 6. That is, it seems as though the 
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random variable aY^ + b is also a sufficient statistic for 0. This coq 
jecture is correct. In fact, every function Z=u{Y x \ or Z 二 

， X 』 = v(X u X 2 ,.,. 3 X n ), not involving 9 y with ^ 
single-valued inverse F,= 冰 (Z)，is also a sufficient statistic for 0, 

prove this，we write, in accordance with the factorization theorei^ 

f( x il ❹)…’/(\; 0) = k ] [u ] (x ] A); 離办，心 ，…，〜)- 

However, we find that^ = w(z) or ? equivalently, 岣 (x, ， x 2 , … ，； cj 、 
w \ v i x i ? A” ■ ■ ， A)]，which is not a function of 0. Hence 

/(%; 0 ) .. ， f(x„; &) = k^{w[v{x u .. 0}k 2 (jc x , x 2 ,.. X „), 

Since the first factor of the right-hand member of this equation is ^ 
function of z = v(x u ，.” 义 ） and 0， while the second factor does 恥史 
depend upon the factorization theorem implies that Z = u(Y { ) is als^ 
a sufficient statistic for 0 t 

Possibly, the preceding observation is obvious if we think about th^ 
sufficient statistic Y x partitioning the sample space in such a way that 
the conditional probability of 不， … ，毛 ， given Y { = does not 
depend upon 0. We say this because every function Z = u(Y } ) with 夺 
single-valued inverse Y x = w(Z) would partition the sample space 

exactly the same way，that is, the set of points 

« 

{(^1 s ^2?. …， \) : Wj (JC| ? • * ■ ， A )= 乃 }， 

for each 少 ，， is exactly the same as 

{(-Xj f ^2, . ■ • ， 5 J^2 j * * * ? ^n) ~ W ( 少 】 )j 

because w[v(x u x 2 , …， \) 卜 ^(x h jc 2 , …，； O = 少卜 

Remark，Throughout the discussion of sufficient statistics, as a matter of 
fact throughout much of the mathematics of statistical inference, we hope 
the reader recognizes the importance of the assumption of having a certain 
mo^el_ Clearly, when we say that a statistician having the value of a certain 
statistic (here sufficient) is as well off in making statistical inferences as the 
statistician who has all of the data, we depend upon the fact that a certain 
model is true. For illustration, knowing that we have i.Ld. variables, each with 
p.dX f(xi 9), is extremely important; because if that f(xi 0) is incorrect or if 
the independence assumption does not hold，our resulting inferences could 
be very bad The statistician with all the data could — and should ― check to 
see if the model is reasonably good. Such procedures checking the model are 
often called model diagnosiics y the discussion of which we leave to a more 
applied course in statistics. 
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We now consider a result of Rao and Blackwell from which we see 
that we need consider only functions of the sufficient statistic in finding 
the unbiased point estimates of parameters. In showing this，we 
can refer back to a result of Section 2,2: If X x and X 2 are random 
variables and certain expectations exist, then 

E[X 2 ) = E[E(X 2 \X,)] 

and 

var (X 2 ) > var 

For the adaptation in context of sufficient statistics, we let the sufficient 
statistic Y { be X x and Y 2 , an unbiased statistic of 0, be X 2 * Thus, with 
E{Y 2 \y } ) - cp{y x \ we have 

Q = E{Y 2 ) = E[q>{Y,)] 

and 

h b 

var (Y 2 ) > var [(piYy)]. 

f 

That is, through this conditioning, the function <p(Y\) of the sufficient 
statistic Y y is an unbiased estimator of $ having smaller variance than 
that of the unbiased estimator F 2 . We summarize this discussion more 
formally in the following theorem, which can be attributed to Rao and 
Blackwell, 

Theorem 3, Let X u X 2f .…， n a fixed positive integer, denote a 
random sample from a distribution (continuous or discrete) that has p.d.f, 
f(x; &) ， 6eQ. Let Y x ― Wi ( 不， X 2 , , X n ) be a sufficient statistic for 6, 

and let Y 2 — u t {X\ y X 2y , ， • ， X n ) y not a function of Y\ alone^ be an 
unbiased estimator of 0, Then E( Y 2 \yi) = <p(yi) defines a statistic <p{ Y % ). 
This statistic <p( Y x ) is a function of the sufficient statistic for 9; it is an 
unbiased estimator of 6; and its variance is less than that of Y 2 . 

This theorem tells us that in our search for an unbiased minimum 
variance estimator of a parameter，we may, if a sufficient statistic for 
the parameter exists, restrict that search to functions of the sufficient 
statistic. For if we begin with an unbiased estimator Y 2 that is not a 
function of the sufficient statistic Y x alone, then we can always improve 
on this by computing = ^(^i) so that ^(K,) is an unbiased 

estimator with smaller variance than that of Y 2 . 

After Theorem 3 many students believe that it is necessary to find 
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first some unbiased estimator Y 2 in their search for 妒 （ D，an unbiased 
estimator of $ based upon the sufficient statistic Y\. This is not the case 
at all, and Theorem 3 simply convinces us that we can restrict our 
search for a best estimator to functions of Kj, It frequently happens 
that E(Y k ) = a0 + b, where ^ ^ 0 and b are constants，and thus 
(Y { — b)ja is a function of Y l that is an unbiased estimator of 0. That 
is，we can usually find an unbiased estimator based on without first 
finding an estimator F 2 . In the next two sections we discover that, in 
most instances，if there is one function <p(F]) that is unbiased, (p(Y { ) 
is the only unbiased estimator based on the sufficient statistic 

Remark. Since the unbiased estimator <p( Y } \ where <p(yi) = E(Y 2 \y\X has 
variance smaller than that of the unbiased estimator Y 2 of 6, students 
sometimes reason as follows* Let the function 丫 (y 3 ) = = y 3 ]， 

where is another statistic, which is not sufficient for 0. By the 
Rao-Blackwall theorem, we have that £1TT(F 3 )] = 0 and T(Yj) has a smaller 
variance than does (p(Y { ). Accordingly, Y(Y 3 ) must be better than 识 （ K【）as 
an unbiased estimator of 6. But this is not true because ^is not sufficient; thus 
6 is present in the conditional distribution of given Y 3 = and the 
conditional mean 丫 ( 少 3 ), So although indeed £[T(y 3 )] ― 0, T(F 3 ) is not even 
a statistic because it involves the unknown parameter 9 and hence cannot be 
used as an estimator. 


Example L Let , X 2 ^ be a random sample from an exponential 
distribution with mean 6 >0^ so that the joint p.d.f. is 

(!) + A + 柳 ， 0 < JC, < 00, 

/ = 1 ， 2, 3， zero elsewhere* From the factorization theorem, we see that 
= Xj + X 2 + 不 is a sufficient statistic for 9. Of course, 

E(Y } ) E(X { + X 2 + X 3 ) = 39 , - 

and thus 1/3 = A" is a function of the sufficient statistic that is an unbiased 
estimator of 0. 

In addition, let Y 2 — X 2 + Jt" 3 and . The one-lo-one transformation 

defined by 


x[ = _ 乃 


X 2 = y 2 — y 3 




has Jacobian equal to 1 and the joint p.d,f. of Y 3 is 

g(y\^yi^yu Q) = I o < 乃 < < 乃 < oo 
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zero elsewhere. The marginal p.d.f* of Y } and y 3 is found by integrating out 
y 2 to obtain 


gniyi.y^Q) 



3 


(乃 二 o < 乃 < a < 00, 


zero elsewhere. The p.d.f. of Y y alone is 

] 

gAy3 ； 0) = - 0 < 少 3 < 00 ， 

u 

zero elsewhere, since K 3 = is an observation of a random sample from this 
exponential distribution. 

Accordingly, the conditional p.dX of Y lf given Y 3 = is 

0) 


2 



Cvi — yj ) e ~ {}?] — ’ 吧 o < < oo s 


zero elsewhere. Thus 

^ ~ [J3 


EI -- 7 Y '\y,\ + E 







^)(y^y,) 2 e-^-^ 6 d y] + 









Of course, £[T( Y 3 )] = 0 and var [T(y 3 )] < var (K|/3), but T(y 3 ) is not 
f statistic as it involves 9 and cannot be used as an estimator of 6. This 
illustrates the preceding remark. 


EXERCISES 

7.18. I? each of the Exercises 7JO, 7J1, 7,13, and 7.14, show that the m.l.e, 
of 0 is a function of the sufficient statistic for 0. 

7.19. Let Kj < Y 2 < < V 4 < Y s be the order statistics of a random sample 

of. size 5 from the uniform distribution having p.df. f(x; 9) - I jO, 
0 <x<9, 0 < 0 < oo 5 zero elsewhere. Show that 2 K 3 is an unbiased 
estimator of 0. Determine the joint p.dS. of Y y and the sufficient statistic 
r 5 for 0. Find the conditional expectation E(2 Y^\y s ) = (p(y 5 ). Compare the 
variances of 2 F 3 and <p(Y 5 ). 

Hint: All of the integrals needed in this exercise can be evaluated by 
making a change of variable such as z = y/8 and using the results associated 
with the beta p.dX; see Section 4A 
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7*20* If X u X 2 is a random sample of size 2 from a distribution having p,dX 
f(x ； e)^(\/e)e^o<x< oo, 0 < 0 < oo, zero elsewhere, find the joint 
p.dX of the sufficient statistic — X t + X 2 for 0 and Y 2 — X 2 - Show that 
Y 2 is an unbiased estimator of 0 with variance 6 2 . Find E(Y 2 \y\) = <p(yO and 
the variance of (p(Y x ). 

7,2 L Let the random variables X and Y have the joint p,d,f. 
/(jc, j；) = {2ie 2 )e-^ + y )! \ 0<x<y<ao, zero elsewhere. 

(a) Show that the mean and the variance of Y are, respectively, 30/2 and 

50 2 /4, 

(b) Show that E(Y\x) = x + 0. in accordance with the theory, the expected 
value of Z + 0 is that of Y, namely, 30/2, and the variance of X + 0 is 
less than that of F, Show that the variance of JT + 0 is in fact 6 2 /4. 

7.22. In each of Exercises 7.10, 7.U, and 7.12, compute the expected value 
of the given sufficient statistic and，in each case, determine m unbiased 
estimator of 0 that is a function of that sufficient statistic alone* 


7,4 Completeness and Uniqueness 

Let X u X 2 , - . - ? X n be a random sample from the Poisson distri¬ 
bution that has p*d.f. 


x = o,i ， 2, o<e ； 


= 0 elsewhere. 

From Exercise 7.11 of Section 7,2 we know that K is a 

sufficient statistic for 6 and its p.dA\ is f&l 

= 0,1，2,…， 

= 0 elsewhere. 


gdyiiG) 


J|! 




Let us consider the family { 幻 ( 少 , ； 0) :O<0} of probability density 
functions. Suppose that the function u(Y\) of is such that 
E[u(Y x )] ― 0 for every 0 > 0. We shall show that this requires u(y { ) 
to be zero at every point y { = 0, 1, 2 S _That is, 

M * B 

E[u(Y } )] - 0, O<0 f … 

^ *■ • 

implies that 

0 — u(0) — m(1) = u(2) — u(3) = * ，， • 
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We have for all 0 > 0 that 

g (n6)y^e- n9 

o^E[u(Y l )}= Z —■ 

yi^o 


e 


一 


nd 


(n0) 


2 


m( 0) + w(l)^j+ m(2) *2P + 


Since e^ n9 does not equal zero, we have that 

n 2 u(2) 


0 — u(0) + [nu( 1 )]0 


2 


e 2 + 


However, if such an infinite series converges to zero for all 0 > 0, then 
each of the coefficients must equal zero. That is ， 

’ 《 2 w(2) 

w(0) = 0, nu{\) = 0, — 2 ~ = 0,… 

- I 

and thus 0 = w(0) = w(l) = u(2 )= …， as we wanted to show* Of 
course，the condition E[u(Yi)] = 0 for all 0 > 0 does not place any 
restriction on u{y x ) when % is not a nonnegative integer. So we see that, 
in this illustration, E[u(Y x )] = 0 for all 6 > 0 requires that w(jj) equals 
zero except on a set of points that has probability zero for each p.d + f- 
Si(y\l 0 <6, From the following definition we observe that the 
family {g } (y } ; 0):0 < 8} is complete. 

Definition 3. Let the random variable Z of either the continuous 
type or the discrete type have a p.d.f. that is one member of the family 
{h(z; 6): $ e Q}. If the condition E[u(Z)] = 0, for every 0 e Q，requires 
that w(z) be zero except on a set of points that has probability zero for 
each p*d.f. h(z; 0), 0 e O, then the family {h(z; 0) : 6e Q} is called a 
complete family of probability density functions. 

B 

Remark. In Section 1.9 it was noted that the existence of £|«( 幻 ] implies 
that the integral (or sum) converges absolutely. This absolute convergence 
was tacitly assumed in our definition of completeness and it is needed to 
prove that certain families of probability density functions are complete. 

In order to show that certain families of probability density 
functions of the continuous type are complete, we must appeal to the 
same type of theorem in analysis that we used when we claimed that 
the moment-generating function uniquely determines a distribution ， 
This is illustrated in the next example* 
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Example L Let Z have a p*d,f- that is a member of the family 
{h(z; 6):0 < 9 < oo }， where 

h(z; 0) = 2 e _ z 搾， 0 < z < 00 ， 

0 

— 0 elsewhere. 

Let us say that E{u{Z)] = 0 for every 8 > 0. That is, 

- i| u(z)e 一训 dz = 0, for 0 > 0. 

Readers acquainted with the theory of transforms will recognize the integral 
in the left-hand member as being essentially the Laplace transform of u{z). In 
that theory we learn that the only function u(z) transforming to a function of 
6 which is identically equal to zero is u(z) = 0, except (in our terminology) on 
a set of points that has probability zero for each h(z; Q\ 0 <$• That is, the 
family {h(z; 6) : 0 <8 < oo} is complete. 

Let the parameter $ in the p ， df_ f(x; 0) 7 OeQ, have a sufficient 
statistic Yi — X 2 , …， X n ) y where ， Z 2 ” ..， l is a random 

sample from this distribution. Let the p,df* of R be 幻 ( 少〖 ;0\9e Q. 
It has been seen that，if there is any unbiased estimator Y 2 (not a 
function of Y f alone) of 9, then there is at least one function of Yi that 
is an unbiased estimator of 6, and our search for a best estimator of 
0 maybe restricted to functions of Y x . Suppose it has been verified that 
a certain function (p(Y t ), not a function of 9, is such that 五 [ 炉 （ D】=0 
for all values of $ y $ e Q, Let \J/(Yi)bc another function of the sufficient 
statistic Y l alone, so that we also have E[\J/(Y X )] — 9 for all values of 
0, G^Q. Hence 

ei^y,) - HY t )] = o y e^a 

If the family {gi(y \； 0):O€Q} is complete, the function 识 Ch) — 

) = 0, except on a set of points that has probability zero. That is, 
for every other unbiased estimator 沴 （ y,) of 0， we have 

<p(y\) = Hy \) 

except possibly at certain special points. Thus, in this sense [namely 
(p{y x ) = 少 (ji)，except on a set of points with probability zero], cp(Y t ) 
is the unique function of which is an unbiased estimator of 6. In 
accordance with the Rao-Blackwell theorem, 炉 （D has a smaller 
variance than every other unbiased estimator of 6. That is，the 
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statistic <p(F,) is the unbiased minimum variance estimator of 6. This 
fact is stated in the following theorem of Lehmann and Scheffe. 

B 

Theorem 4. Let X u X 2 ,^ n a fixed positive integer, denote a 
random sample from a distribution that has p.d.f. f(x; 6), 9eQ, let 
Yi — u x (Xi y X 2 , ^ y X n ) be a sufficient statistic for 0, and let the family 
{gi {y { \6):0 e 12} of probability density functions be complete • If there 
is a function of Y } that is an unbiased estimator of 0, then this function 
of Y、is the unique unbiased minimum variance estimator of 0. Here 
“unique” is used in the sense described in the preceding paragraph. 

The statement that Y t is a sufficient statistic for a parameter 0, 
6 and that the family {gi(yu 6 ) : 0eQ} of probability density 
functions is complete is lengthy and somewhat awkward. We shall 
adopt the less descriptive, but more convenient, terminology that K, 
is a complete sufficient statistic for 9. In the next section we study a fairly 
large class of probability density functions for which a complete 
sufficient statistic Y } for 0 can be determined by inspection. 

EXERCISES 

* 

7*23. lfaz 2 + bz + c = 0 for more than two values of 2, then a = b = c — 0 . 
Use this result to show that the family {^(2, 0):0 <9 < 1} is complete. 

7*24. Show that each of the following families is not complete by finding at 
least one nonzero function u(x) such that E[u{X)] — 0, for all 旮 > 0. 

(a) f(x; 0) — —0<x<0，where 0 < 0 < oo, 

20 

= 0 elsewhere, 

(b) N(0 f 8), where 0 <6 < ao. 

7.25. Let .. ^X n represent a random sample from the discrete 

distribution having the probability density function 

f(x; 0) = 0^(1 -ey-\ jc - 0, 1, 0<fl< I, 

= 0 elsewhere. 

it 

Show that K = [足 is a complete sufficient statistic for 0, Find the unique 

function of Y x that is the unbiased minimum variance estimator of d. 

Hint: Display £[w(F,)] = 0, show that the constant term u(0) is equal 
to z^to, divide both members of the equation by 8 ^ 0, and repeat the 
argument* • 
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7.26* Consider the family of probability density functions {h(z; 6) : 9 e f2}, 
where h(z; 6) = 1/ ❹， 0 < z < &， zero elsewhere- 

(a) Show that the family is complete provided that 12 — {0 : 0 < 0 < oo}. 
Hint: For convenience, assume that u(z) is continuous and note that the 
derivative of E[u(Z)] with respect to 6 is equal to zero also. 

(b) Show that this family is not complete ifQ={d: 1 < 6 < oo}. 

Hint: Concentrate on the interval 0 < 2 < I and find a nonzero function 
u(z) on that interval such that E[u{Z)] = 0 for all 0 > 1 ■ 

7*27* Show that the first order statistic Y x of a random sample of size n from 
the distribution having p.d,f, f(x; d) = e^ {x ^ 6 \ 6 < x < oo y —od < 8 < co, 
zero elsewhere, is a complete sufficient statistic for 6. Find the unique 
function of this statistic which is the unbiased minimum variance estimator 
of 6. . 

7*28. Let a random sample of size n be taken from a distribution of the discrete 
type with 6) = 1/0, x — 1 ， 2,… ， 0， zero elsewhere, where 0 is an 

unknown positive integer. 

(a) Show that the largest observation，say X， of the sample is a complete 
sufficient statistic for 0. 

(b) Prove that 

[F +I -(Y- l) n+ *]/[r — (7— l) n ] 

is the unique unbiased minimum variance estimator of Q. 

■* 

7.5 The Exp ❶ nentiai Class of Probability Density Functions 

f * 

Consider a family {f(x; 9):0e Q} of probability density functions ， 
where Q is the interval set O — : y < 0 < <5} , where y and <5 are known 
constants, and where 

f(x; 6) ^ exp [p($)K(x) + S(x) + q{9)\ a < x <b ， 

— 0 elsewhere* (I) 

A p.d.f of the form (1) is said to be a member of the exponential 
class of probability density functions of the continuous type. If，in 
addition, 

1- 4 s ♦ 

L neither a nor b depends upon 0, y < 6 < S, 

2. p(8) is a nontrivial continuous function of 0 7 y < 6 < S, 

3, each of K'ix) # 0 and S(x) is a continuous function of a < x < b. 
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we say that we have a regular case of the exponential class. A p.d.f. 
fix; B) = exp [p(9)K(x) + S(x) + q{B)l x ^ a 2 , , 

= 0 elsewhere, 

$ said to represent a regular case of the exponential class of probability 
density functions of the discrete type if 

1. The set {x : x = a l$ - ..} does not depend upon 6. 

2. p(6) is a nontrivial continuous function of 9, y <0 < S, 

3. K(x) is a nontrivial function of i on the set {x: x ^ a u a 2 ,, , 

For example，each member of the family {f(x; 6):0 < 6 < oo}, 

where f(x; 6) is A^(0, 0), represents a regular case of the exponential 
class of the continuous type because 


f(x; 0)= 




e 


-x^/2S 




ln v /^) 


00 < X < 00, 


Let A ， A ， .，尤 denote a random sample from a distribution 
tlj^has a p，dX that represents a regular case of the exponential class 
ofthe continuous type. The joint p.d_f. of H … ， A ； is 


exp 


P(^) Z K ( x i) + S S(Xi) + nq{0) 


for a < Xj < b，i = 1 ， 2, •… ，乃 ， y < 0 < S y and is zero elsewhere At 
pouits of positive probability density, this joint pAS. may be written 
as the product of the two nonnegative functions 


exp 


Pi^f, 取 ) + nq(6) 


exp X S(x t ) 


i- _i |_ * 

In accordance with the factorization theorem (Theorem 1 ， Section 7.2) 
K (^i) ^ a sufficient statistic for the parameter 0, To prove that 


R —[ 欠 (I/) is a sufficient statistic for 0 in the discrete case* we take 

I + f _ 

the joint p—dX of H …，尤 to be positive on a discrete set of 
points, say, when x i e{x:x = a^a 2 ,., Wethenusfe 

factorization theorem. It is left as an exercise to show that in 
either the continuous or the discrete case the pAS. of Y { is of the form 

gi(yi ； 0) = R(y x ) exp [p{9)y x + nq{6)] 
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at points of positive probability density. The points of positive 
probability density and the function Riy } ) do not depend upon 0, 
At this time we use a theorem in analysis to assert that the family 
{gi(y { ;0):y <0 < 8} of probability density functions is complete. 
This is the theorem we used when we asserted that a moment¬ 
generating function (when it exists) uniquely determines a distribution. 
In the present context it can be stated as follows. 


Theorem 5. Let f(x; &}，y < d < S，be a p.d.f which represents a 
regular case of the exponential class. Then if X u X 2 ,,,., (where n 
is a fixed positive integer) is a random sample from a distribution with 

n 

p-dX f(x; 6), the statistic ^ ^ Kd) is a sufficient statistic for 0 and 

I 

the family {g\(y [； 9):y < 9 < 3} of probability density functions of Y x 
is^com^lete. That is, Yy is a complete sufficient statistic for 9. 


This theorem has useful implications. In a regular case of form 

IT 

(1)，we can see by inspection that the sufficient statistic is =£ K(X t ). 

If we can see how to form a function of Y u say (p(Y t % so that 
E[(p( F,)] ^ e, then the statistic (p(Y { ) is unique and is the unbiased 
minimum variance estimator of 6. 


Example L Let X { , X 2lf …， denote a random sample from a normal 
distribution that has p-d.f. 


fix ； 6) 


o^hi 


exp 


(x — 0) 
2a 2 


T 


—qo < x < oo^ — 00 < 0 < oo 


or 

Ax ； 0) = exp \^x ~ - In — 各 ). 

Here a 2 is any fixed positive number. This is a regular case of the exponential 
class with 

麟 = 4 ， 則〜 

S(x) = 一 ^ — In Jlna 2 , q(6 )= - 去 . 

Accordingly ， X t + X 2 + ^nX is a complete sufficient statistic 
for the mean 8 of a normal distribution for every fixed value of the variance 
cr 2 - Since E(Y t ) = nO, then ^(F,) = YJn ^ X is the only function of K, that 
is an unbiased estimator of B\ and being a function of the sufficient statistic 













336 


Sufficient Statistics [Ok 7 


y f ，it has a minimum variance. That is 5 X is the unique unbiased minimum 
variance estimator of 6. Incidentally, since Y\ is a one-to-one function of 
X itself is also a complete sufficient statistic for 8. , 

Example 2 . Consider a Poisson distribution with parameter 9 ,0 < 6 < ao. 
The pAS, of this distribution is 

0 x e^ e 

f(x; 6) = = exp f(ln 6)x - Jn (x!) - 9], x = 0, I ， 2, •…， 

— 0 elsewhere. 

ft 

In accordance with Theorem 5, Y x = X f is a complete sufficient statistic for 

i 1 

Since E{Y X ) = nB^ the statistic q>(Y x ) = YJn = X, which is also a complete 
sufficient statistic for 6 f is the unique unbiased minimum variance estimator 
of 8, 

.聱 

EXERCISES 
7.29* Write the p # dX 

f(x; 0) = x^e" xl& , 0 < X < oo, 0 <0 < oo 9 
6(r 

zero elsewhere, in the exponential form. If X u X 1 y^^X n isB r random 
sample from this distribution, find a complete sufficient statistic Y x for 9 
and the unique function < 50 ( Y t )of this statistic that is the unbiased minimum 
variance estimator of 6. Is cp( Y t ) itself a complete sufficient statistic? 

7_30_ Let X 2 , .,,, X n denote a random sample of size n > 1 from a 
distribution with p.d.f* f(x; 0) — 6e~ ex f 0 < x < 00 , zero elsewhere，and 

n 

0 > 0, Then F = ^ X is a sufficient statistic for 0, Prove that {n — \)jY is 

* 1 

the unbiased minimum variance estimator of 9. 

7.31. LetJTj, A" 2 , * -., X n denote a random sample of size n from a distribution 
with p.df. f(x; 6) = 0 < x < I, zero elsewhere, and Q > 0. 

⑻ Show that the geometric mean (X x X 2 ^ ■ * X n Y in of the sample is a 
complete sufficient statistic for 6, 

(b) Find the maximum likelihood estimator of 0, and observe that it is a 
function of this geometric mean. 

7*32* Let X denote the mean of the random sample X u X 2f •", 尤 from a 
gamma-type distribution with parameters a> 0 and P = 8 > Q. Compute 
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Hint: Can you find directly a function of X such that 

EmX)] = d? Is E{X x |3f) = 卿 Why? 

7.33. Let be a random variable with a p.dX of a regular case of the 

exponential class. Show that E[K(X)] = provided these 

derivatives exist, by differentiating both members of the equality 

I exp [p(8)K(x) + S(x) + g(6)] dx ^ 1 

f ， 

with respect to 6. By a second differentiation, find the variance of K{X). 

* 

7.34. Given that f(x; d) = exp [0K(x) + S(x) + 测， a<x<b 9 y<d<S, 
represents a regular case of the exponential class, show that the moment¬ 
generating function M(t) of F- K(X) is M(t) = exp [q(d) - q(6 + r)] 9 
y <6 + t <3. 

X35. Given, in the preceding exercise, that E(Y) ^ E[K(X)] - 9^ Prove that 

y is m I). 

Hint: Consider M'(0) = 8 and solve the resulting differential equation. 

7,36* If X u X 29 • ■ ■ ，尤 is a random sample from a distribution that has a 
p，df which is a regular case of the exponential class, show that the p.cLf. 

ft 

°f ^ = S is of the form 幻 ( 乃； 0 ) = Riyy) exp [p(6)y i + nq(8)]. 

Hint: Let Y 2 = X 2y Y n ^ X„ be n — l auxiliary random variables. 
Find the joint p.df, of F! ， F 2 , •., ， and then the marginal p.dX of IV 

7*37* Let Y denote the median and let Z denote the mean of a random sample 

of the size n ^ 2k + l from a distribution that is N(ji, a 2 ). Compute 
E(Y\X^x). 

Hint: See Exercise 732* 

7.38. Let ，•” be a random sample from a distribution with p.d.f 

/(x; 0) = e 2 xe~ 0 \ 0<x< oo a where 0 > 0. 

n 

(a) Argue that F = 冗不 is a complete sufficient statistic for 9. 

i 

(b) Compute £*(1/10 and find the function of Y which is the unique 

unbiased minimum variance estimator of 6. 1 

7*39. Let X U X 2 ^ - * *, n > 2， be a random sample from the binomial 
distribution b(U 6). 

(a) Show that K + - • * + is a complete sufficient statistic 

for 9. 

- 

(b) Find the function <p(Y\) which is the unbiased minimum variance 
estimator of 9. 

(c) Let Y 2 — (X } + X 2 )/2 and compute E(Y 2 )- 

(d) Determine E(Y 2 \Y] = ^) + 
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7*6 Functions of a Parameter 

Up to this point we have slight an unbiased and minimum 
variance estimator of a parameter 9. Not always, however, are we 
interested in 6 but rather in a function of 6. This will be illustrated in 
the following examples. 


Example I* Let X 2 ^ denote the observations of a random 

sample of size n > 1 from a distribution that is b(\^ 0) t O <9 < 1. We know 

that if X h then Yjn is the unique unbiased minimum variance estimator 

i * ■ 

of a Now the variance of Yjn is 0(1 - Q)jn. Suppose that an unbiased and 
minimura variance estimator of this variance is sought. Because Y is a 
sufficient statistic for 0, it is known that we can restrict our search to functions 
of K Consider the statistic (y/«)(l - Yfn)/n. This statistic is suggested by the 
fact that Yjn is an estimator of 6. The expectation of this statistic is given by 

K 并 - 罚+⑺+⑺. 


Now E(Y) ^ n6 and E( Y ? ) = nS(l — 



0) -f n 2 9 2 . Hence 
n -l 6(1-0) 


n n 


If we multiply both members of this equation by nj{n — 1), we find that the 
statistic (r/n)(! - Yjn)jin - 1) is the unique unbiased minimum variance 
estimator of the variance of Yjn. 


A somewhat different, but also very important problem in point 
estimation is considered in the next example. In the example the 
distribution of a random variable X is described by a p.d.f* f(x ； 6) that 
depends upon 0 g O, The problem is to estimate the fractional part of 
the probability for this distribution which is at, or to the left of, a fixed 
point c. Thus we seek an unbiased minimum variance estimator of 
F{c; 8), where F(x; 9) is the distribution function of X, 

Example 2. LctX^ X 2 ," ” 尤 be a random sample of size n > 1 from a 
distribution that is N(6, 1). Suppose that we wish to find an unbiased minimum 
variance estimator of the function of 6 defined by 


Pr (X < c)= 


oo 




e -ix - 9M = ||)( c _ 


where c is a fixed constant. There are many unbiased estimators of ❿ (c — 0) ‘ 
We first exhibit one of these, say u(X t )^ a function of X t alone. We shall then 
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compute the conditional expectation^ut^JIX = 3c]= 穿 (J), of this unbiased 
statistic, given the sufficient statistic 1, the mean of the sample. In accordance 
with the theorems of Rao-B lack well and Lehmann-Scheffe, <p(X) is the 
unique unbiased minimum variance estimator of (J>(c — 0). 

Consider the function u(x } ), where 


w( A) = 1 




= 0 ， Xj > c. 

The expected value of the random variable u{X x ) is given by 


产 00 


EHXi)] 


傘 I ) 






exp 


(jc, — 0 ): 




r*r 

J-O 0 


(1) 


^2n 


exp 



because u(x t ) ^ 0,x t > c. But the latter integral has the value 0(c - 0). That 
is ， u(X\) is an unbiased estimator of ^>(c — 6). 

We shall next discuss the joint distribution of and Zand the conditional 
distribution of given X = x. This conditional distribution will enable us 

■Mb 

to compute E[u{X^ )\X — x] = <p(jc). In accordance with Exercise 4.92, Section 
4.7, the joint distribution of X { and X is bivariate normal with means 6 and 
❹， variances <r] ^ I and a\ = l/n, and correlation coefficient p = \\ J~n. Thus 

the conditional p.df. ofl】，given X-x, is normal with linear conditional 
mean 

i 

e + ^(x-d)^j 

汀 2 

and with variance 


^(1 - P 2 ) 


n 




n 


The conditional expectation of u(X { ), given X — 3c, is then 


r oo 


<P(-y) 


wUi ) 


oa 


In I 

一 n(x { - xf~ 

vA- ， A exp 

L 2(n - 1 ) 」 


dx I 



The change of variable z = ^/n{x x - x)/^/n - 1 enables us to write, with 
c y - — xy/^Jn— !, this conditional expectation is 


<p(x) 




00 


— p =： e^ z2/2 dz = 0(〆)=c|> 


\/n(c - x) 

. 戮一 X ■ 
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Thus the unique, unbiased, and minimum variance estimator of 0(c — B) is, 
for every fixed constant c, given by (p(X) = — X)j^Jn —"\\ 

Remark. We should like to draw the attention of the reader to a rather 
important fact. This has to do with the adoption of a principle ，such as the 
principle of unbiasedness and minimum variance* A principle is not a theorem; 
and seldom does a principle yield satisfactory results in all cases. So far, this 
principle has provided quite satisfactory results. To see that this is not always 
the case, let X have a Poisson distribution with parameter 0, 0 < 9 < oo* We 
may look upon X as a random sample of size I from this distribution. Thus 
X is a complete sufficient statistic for 9. We seek the estimator of e~ 28 that is 
unbiased and has minimum variance. Consider F= ( — 1)( We have 

Accordingly ， （一 I 广 is the unbiased minimum variance estimator of e^ 2e . Here 
this estimator leaves much to be desired* We are endeavoring to elicit some 
information about the number where 0 < e~ w < L Yet our point 
estimate is either 一 1 or + I ， each of which is a very poor estimate of a number 
between zero and L We do not wish to leave the reader with the impression 
that an unbiased minimum variance estimator is bad. That is not the case at 
all. We merely wish to point out that if one tries hard enough, he can find 
instances where such a statistic is not good. Incidentally, the maximum 
likelihood estimator ofe~ 2d is, in the case where the sample size equals 1 , e" 2X , 
which is probably a much better estimator in practice than is the unbiased 
estimator (— 1 )' 

EXERCISES 

7.40, Let X u m 9 X n denote a random sample from a distribution that is 
1 )， 一 oo < 0 < oo. Find the unbiased minimum variance estimator 

of e\ _ 

Hint: First determine E(X 2 ), 

..t . ♦:. 曹 

7.4L Let X u X 2 ,,, ^ X n denote a random sample from a distribution that is 

_ • 

N(Q ， 8). Then ^ = X A? is a complete sufficient statistic for 8. Find the 
unbiased minimum variance estimator of 0 2 t 

7,42* In the notation of Example 2 of this section, is there an unbiased 
minimum variance estimator of Pr (— c < X < c)? Here c > 0* 

7-43. Let X 2 , ^ . j X n he a random sample from a Poisson distribution 
with parameter 0 > 0. Find the unbiased minimum variance estimator of 
Pr (X< 1) - (1 + 


E(Y)^E[(-m 






x! 
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Hint: Let u(x { ) = 1 5 x t <, 1, zero elsewhere, and find E[u{X— y], 

n 

where Y Make use of Example 2, Section 4,2, 

■ i - 、 

7.44, Let X 27 ... ^ X n denote a random sample from a Poisson distribution 

with parameter 0 > 0 - From the Remark of this section we know that 

⑻ Show that E[(- l)^\Y { - y t ] = (1 - 2//i 户 ， where = X { + X 2 + 

■ “ + • 

Hint: First show that the conditional p.dX of X u X 2 ,... i X n _ u 
given Fj = y u is multinomial, and hence that of X { given K, =- y t is 
Mjh ， l/«). _ 

(b) Show that the m 丄 e. of e_ w is 

(c) Since = nx, show that (I — 2/n}^ is approximately equal toe -2 " when 
n is large* 

7.45* Let a random sample of size n be taken from a distribution that has the 

p.dX f(x; 6) ^ (1/0) exp (—x/0)/ (Oi ^(x). Find the mj.e* and the unbiased 
■ minimum variance estimator of Pr (X < 2). 


in The Case of Several Parameters 

In many of the interesting problems we encounter, the p.dX may 
not depend upon a single parameter 6, but perhaps upon two (or more) 
parameters^ say 0 X and 0 2 , where a two-dimensional 

parameter space. We now define joint sufficient statistics for the 
parameters. For the moment we shall restrict ourselves to the case of 
two parameters. 


Definition 4, Let X u X 2y .. denote a random sample from a 
distribution that has p.dX f(x; 0,, 0 2 ) y where (e i ,6 2 )eQ. Let 
F, = Ui(X n X 2 y . *. 5 X n ) and Y 2 = u 2 {X u ，…， be two statistics 
whose joint p.dX isg {2 (y u y 2 ; 0, ， 0 2 ). The statistics Y { and F 2 are called 


joint sufficient statistics for 0, and d 2 if and only if 

/(X,; 仏， 0 2 )f(X 2 ; U2) … /(X n ；^|^ 2 ) 文 

(义 I ， ， • • ，义 ”) ，以 2(-^1 ， ， ■ * ，〜 ); 沒 I ，沒 2] 



where H(x u x 2 , ••■，&) does not depend upon 8 { or 0 2 . 

As may be anticipated，the factorization theorem can be extended* 
In our notation it can be stated in the following manner. The statistics 
F. — u } (X ]9 X 2 , ,,., X n ) and V 2 — u 2 (X u X 2 ^ .. *, X n ) are joint suffi- 
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dent statistics for the parameters 0 X and 0 2 if and only if we can find 
two nonnegative functions k x and k 2 such that 

Qi¥{x 2 ； o u e 2 )- e u e 2 ) 

fe 

一 [^1 (-^1 9 . " ， A )， w 2( X|s *^2，• " ， A ); 沒 I ， 沒 2] 灸 2( Ii ， *^2, _•*，〜)， 

where the function k 2 (x l ,x 2 , •. • ， does not depend upon both or 
either of 6 X and 0 2 . 

Example 1 ， Let X !， ，•… ，尤 be a random sample from a distribution 
having p.df. 


f{x\ Q x ^ 沒 2 ) = ^~， Oi ^0 2 <x<0i + d 2v 

■ T 

m 

=0 elsewhere, 

” -V- *f i . , ' . / 

where — oo <0! < oo, 0<d 2 < oo. Let Y\<Y 2 < m * m < be the order statis¬ 
tics* The joint p.d,f. of Y } and Y n is given by 


f * hOi — 1 ) 

Sin(yiyy^;0i,o 2 )^ ( 加 ) 0^_y t y 


6 { -0 2 <y\<yn<Q\+0 2 , 


and equals zero elsewhere. Accordingly，the joint p.dX of 為， AV ， 
be written，for points of positive probability density, 

(1 V n(n—l)[max (x f ) - min (x i )] n ~ 2 

W = "^7 (2e 2 y r . 


3 尤 can 



l)[max ⑻ 一 min (x f )] n ^ 2 

Since min (Xi) < ： Xj< max (x^ 1 ， 2” the last factor does not depend 

upon the parameters. Either the definition or the factorization theorem 
assures us that Y % and Y„ are joint sufficient statistics for 0 f and 0 2 . 


The extension of the notion of joint sufficient statistics for more 
than two parameters is a natural one. Suppose that a certain p.dX 
depends upon m parameters. Let a random sample of size n be taken 
from the distribution that has this pAS. and define m statistics. These 
m statistics are called joint sufficient statistics for the m parameters if 
and only if the ratio of the joint p-dJ. of the observations of the random 
sample and the joint p.dX of these m statistics does not depend upon 
the w parameters, whatever the fixed values of the m statistics. Again 
the factorization theorem is readily extended. 

There is an extension of the Rao-Blackwell theorem that can be 
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adapted to joint sufficient statistics for several parameters, but that 
extension will not be included in this book, Hovvever, the concept of 
a complete family of probability density functions is generalized as 
follows: Let 

{/( A ， …， 〜; 沒1，02,…， 0 W ) :( H ，…，沒 m ) e 

denote a family of probability density functions of k random variables 
厂 1 ， G，，• •，厂 * that depends upon m parameters {6 U 9 2 , ■ • ■ ， 6 m ) e Q. 
Let u(v u v 2 , … • ， v k )bt a function of v u * ^ v k (but not a function 

of any or all of the parameters). If 

取 K,，K 2 ，…， 0 

for all (0j, 0 2 , ， … e D implies that u{v Xy u 2 ,，. •，叫 ）= 0 at all 
points (r_, r? 2 , …， v k ), except on a set of points that has probability 
zero for all members of the family of probability density functions, we 
shall say that the family of probability density functions is a complete 
family. 

The remainder of our treatment of the case of several parameters 
will be restricted to probability density functions that represent what 
we shall call regular cases of the exponential class- Let X, ， JSf 2 ,.，. ， X ny 
n > m ，denote a random sample from a distribution that depends on 
m parameters and has a p*d*f* of the form 

ttt *■ ■_ ■ 

/(^； A ，02,…，又） =exp Z Pj{6 x 為， …， 6 m )Kj{x) 

! 

0 ) 

+ S(x) + 氣 0 2 ， … ， 0J 

* r ， 

for a < x < 6, and equals zero elsewhere. 

A p-dX of the form (1) is said to be a member of the exponential 
class of probability density functions of the continuous type. If，in 

i , 脅 a * * * 1 * ■ *■ i. 

addition, … 4 

rh i 

‘眷 < ■ m *■ * 泰. • * 

L neither a nor b depends upon any or all of the parameters 
沒 I ，沒2， * * * ， - 

2 - the pj{&u 0 2 , •…， j = 1，2, •…，爪， are nontrivial, functionally 
independent, continuous functions of 0^ jj < 8j < Spj ^ 

1，2， * ■ 72 ， t _ ' . . # T 、 

3, the Kj(x),j = 1，2, • • • ， m，are continuous fora < x < b and no one 
is a linear homogeneous function of the others， • 
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4. S(x) is a continuous function of a < x < b, we say that we have 
a regular case of the exponential class ‘ 

The joint p*d.f, of X u X 2 , , X n is given, at points of positive 

probability density, by 



，❹ m ) 乞 ^y(^) + z s ( x i) + nqifil ， 6 m } 

I I 


=exp 



t Kjixd + nq(e 

i^= 1 



x exp 




z 取) 


In accordance with the factorization theorem, the statistics 


Y { =i K,(X^ Y 2 =t 職 X •■■，'=! KM) 

i — 1 i ^ i / — I 


are joint sufficient statistics for the m parameters 0 U Q 2 , ^ ^ 6 m . It is 
left as an exercise to prove that the joint p_dX of Y u ... is of 


the form 

m 

及 Or， …， A) exp [ p 池 ，，…， e m )yj + nq(0 卜 

/= i 


0 m ) ( 2 ) 


at points of positive probability density. These points of positive 
probability density and the function R(y { ，…， y m )do not depend upon 
any or all of the parameters 0 U 6 2 , … ， 6 m . Moreover, in accordance 
with a theorem in analysis, it can be asserted that，in a regular case of 
the exponential class，the family of probability density functions of 
these joint sufficient statistics Y l7 Y ly … ■ ， F m is complete when n> m. 
In accordance with a convention previously adopted, we shall refer to 
Y u Y 2 ^ h • ， Y m as joint complete sufficient statistics for the parameters 

、❹2， 1 * * ，❹ 

Example 2. Let X 2i …， ）( n denote a random sample from a 
distribution that is N(6 U 0 2 ), — oo < 0, < oo, 0 < 0 2 < oo. Thus the p.dX 
f(x; 0 U 9 2 ) of the distribution may be written as 

/_ \ 6 [ 0] 

nx ； O h 0 2 )^ exp ( 2 ^x 2 + 瓦 x- 沉 _ In 
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Therefore, we can take ^,(x) = ^and ^(jc) = jc. Consequently, the statistics 

Fj — an d K2 — 

1 j . . 

are joint complete sufficient statistics for (9, and $ 2 . Since the relations 


Z, 


Y 2 

n 


X, 




( 不-奸 

a _ i /I — 1 


define a one-to-one transformation, Z, and Z 2 are also joint complete 
sufficient statistics for 6 l and 8 2 . Moreover, 

五 ( 之 1 ) = h and EiZ^) — 02* 


From completeness, we have that 2, and Z 2 are the only functions of Y { and 
Y 2 that are unbiased estimators of 6 } and 0 2 , respectively, 

A p ， d.f. 


yi[x, 01 , 02 ^ 7 0 m ) — exp 


m 

[ Pj(U ， ， 


， e m )Kj{x) + S(x) 


+ q (❹ '， ^2， " * ，， X ~ £?2 i A , …， 

zero elsewhere, is said to represent a regular case of the exponential 
class of probability density functions of the discrete type if 

1 - the set {x : x^a x , a 2 ,. • ] does not depend upon any or all of the 
parameters 0 2 ,..， ， 

2 * thepj(e ly … 1,2 ,... ， m，are nontrivial, functionally 
independent, and continuous functions of 0 /f y ： < < S h j ^ 

1，2 ，… ，m， 

3- the Kj(x), j = 1, 2” ■. ， m，are nontrivial functions of x on the set 
{x: x = a u a 2 , " •} and no one is a linear function of the others. 

Let X u X 2y •. * ， denote a random sample from a discrete-type 
distribution that represents a regular case of the exponential class. 
Then the statements made above in connection with the random 
variable of the continuous type are also valid here. 

Not always do we sample from a distribution of one random 
variable X. We could, for instance, sample from a distribution of two 
random variables V and W with joint p.cLf. /(£?, d %， ■. • ， 0 m ). 
Recall that by a random sample ( V x , W { ),(V 2 , W n ) from 

a distribution of this sort, we mean that the joint p.dX of these 2n 
random variables is given by 

61 】， … ， e m )f(v 2 , 仏 ，…， H A … • ，軋 ). 
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In particular, suppose that the random sample is taken from a 
distribution that has the p,d*f* of V and W of the exponential class 

f(v, w; Q u , _ , e m ) 


exp 


Z P 熟，…， 6 m )Kj{v, w) + S(v, w) + g(0 . 九 ) 


⑶ 


fox a <V <b,c <W < d, and equals zero elsewhere, where a, b, c, ddo 
not depend on the parameters and conditions similar to 1 to 4 , p, 343 , 
are imposed. Then the m statistics ’ 


yi = t 爲 …， 

i=i ， 


^{V h W t ) 

/ = I 


are joint complete sufficient statistics for the m parameters 

U 2 , • • • ， 


EXERCISES 

7,46, Let ^ < Yj be the order statistics of a random sample of size 3 

from the distribution with p.d.f. 

f(x; & iy e 2 ) = ^ exp ( S A 

ff t <x< 00 , -00 < A < 00 ， O<0 2 <oo, 

zero elsewhere. Find the joint p.d.f, of Z t - V lf Z 2 = r 2? and Z 3 - 
r, + Y 2 + The corresponding transformation maps the space 
{(y! ， h ， 少 3): < yi < yi < Vi < 00} onto the space 

{(z〗，^ 3 )* 8\ Zi Zj 〈(Z 3 - Z\ )/2 00 j 

Show that Z x and Z 3 are joint sufficient statistics for 0, and e 2 . 

* & ■■ 

7.47* Let X u X 2 , * * - 5 be a random sample from a distribution that has 
a p.dX of form (I) of this section. Show that y f = ^ K } (X^ 

n i=t 

…” = [ K m (^i) have a joint p,d.f. of form ( 2 ) of this section, 

/ = i 

7*48. Let (X u Y^X (^ 2 , YiX — -, {X„, Y n ) denote a random sample of size n 
from a bivariate normal distribution with means ^ and n 2 , positive 

variances a] and of, and correlation coefficient p. Show that ^ f] Y h 

» « it t i [ 9 

S 巧， and Z are joint complete sufficient statistics for the five 

iii ■ 
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parameters. Are K = ^ ^ ^ — Xfjn ， S\ = 

E ( JV— Yfjn, and ^ (X f — X){Yi — Y)jnS x S 2 also joint complete sufficient 

i j 

statistics for these parameters? 

** 

7.49. Let the p,dX f(x; 6 ly d 2 ) be of the form 

exp [/?, (0 ( , 9 2 )Kt(x) + p 2 (0 { , 0 2 )K 2 (x) + S(x) + q{B u 0 2 )l a<x<b ， 

zero elsewhere. Let ^(x) — cK^ix). Show that f(x; 9^ d 2 ) can be written 

in the form ^ 

■ * * 

' ■ a A 

exp [p(d { , 92 )K(x) + S(x) + ^, (0, » 0 2 % a <x < b, 

zero elsewhere. This is the reason why it is required that no one Kj(x) be 
a linear homogeneous function of the others, that is, so that the number of 
sufficient statistics equals the number of parameters, 

7.50* Let Y { < Y 2 < 4 * u < Y„ he the order statistics of a random sample 
X 2$ .•” X n of size n from a distribution of the continuous type with 
p,d.f- f{x\ Show that the ratio of the joint p.dt of X h X 2 , - *. ^ X n and that 
ot < Y 2 < 9 — < Y n is equal to 1/n!，which does not depend upon the 
underlying pH This suggests that Y x <T 2 < < Y n are joint sufficient 

statistics for the unknown “parameter”/ 

f - * 

7.51* Let H … ， Jf" be a random sample from the unifonri distri¬ 
bution with pA.t f(x; 9 1 , 8 2 ) = l/(20 2 )^ ❹ a — ❹ i < x < S' + ❹ 2 , where 
一 00 < 0, < 00 and 0 2 > 0, and the p.d,f* is equal to zero elsewhere. 

(a) Show that Y t = min ( 不 ） and Y„ = max (XX the joint sufficient 
statistics for Q x and 0 2 , are complete, 

(b) Find the unbiased minimum variance estimators of 6 t and 0 2 . 

ji 

7.52. Let X u X 2 , _ ， be a random sample from N(6 U Q 2 y 

(a) If the constant 6 is defined by the equation Pr {X <b) = 0*90, find the 
m 丄 e, and the unbiased minimum variance estimator of b. 

(b) If c is a given constant, find the m.Le. and the unbiased minimum 
variance estimator of Pr (X < c), 

4 

7.8 Minimal Sufficient and Ancillary Statistics 

In the study of statistics，it is clear that we want to reduce the data 
contained in the entire sample as much as possible without losing 
relevant information about the important characteristics of the 
underlying distribution. That is, a large collection of numbers in the 
sample is not as meaningful as a few good summary statistics of those 
data. Sufficient statistics, if they exist, are valuable because we know 
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that the statistician with those summary measures is as well off as the 
statistician with the entire sample* Sometimes, however, there are 
several sets of joint sufficient statistics，and thus we would like to find 
the simplest one of these sets. For^llustration, ji^a^sense^ the 
observations X x , X 2 , . ■ •，义 ，/? > sample from 

NUt) could be thought of as joint sufficient statistics for 8 t and 0 2 . 
We know，however, that we can use X and S 2 as joint sufficient statistics 
for those parameters, which is a great simplification over using 
X u x 2 , ts x n , particularly if n is large, 瑜毛化、捧物 

In most instances in this chapter, we have been able to find a single 
sufficient statistic for one parameter or two joint sufficient statistics for 
two parameters. Possibly the most complicated case considered so far 
is given in Exercise 7*48, in which we find five joint sufficient statistics 
for five parameters. Exercise 7,50 suggests the possibility of using the 

妨统 i 僮 a random sample for some completely unknown 
distribution of the continuous type. 

What we would like to do is to change from one set of joint sufficient 
statistics to another, always reducing the number of statistics involved 
until we cannot go any further without losing the sufficiency of the 
resulting statistics. Those statistics that are there at the end of this 
process are called minimal sufficient statistics for the parameters. That 
is，mmM/ staiUtic^ are those that are sufficient for the 

parameters and are functions of every other set of sufficient statistics 
for those same parameters. Often, if there are k parameters, we can find 
^ joint sufficient statistics that are minimal- In particular, if there is one 
parameter, we can often find a single sufficient statistic which is 
minimal. Most of the earlier examples that we have considered 
illustrate this point, but this is not always the case as shown by the 
following example. 

Example L Let X U X 2 ^ ”” 义 be a random sample from the uniform 
distribution over the interval (6 - 1,8 + 1) having p.d.f. 

f(x; 0) = where -oo < 0 < oo ( 

The joint of X u X 1% •,, ， equals the product of (j) n and certain 

indicator functions, namely 

i - t 

because 9 -1 < min (^) <Xj< max (x ( ) < 0 = 1 ， 2, _, • ， w. Thus the 

order statistics Y x — min (^) and Y u - max (X { ) are the sufficient statistics for 
These two statistics actually are minimal for this one parameter, as 
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which is the midrange. We recognize, however, that this m.l . 匕 is not unique. 
Some might argue that since § is an of 0 and since it is a function of 
the joint sufficient statistics, Y x and Y n , for 6, it will be a minimal sufficient 
statistic. This is not the case at all，for § is not even sufficient. Note that the 
mJ.e. must itself be a sufficient statistic for the parameter before it can be 
considered the minimal sufficient statistic. 


we cannot reduce the number of them to less than two and still have 
sufficiency. 

There is an observation that helps us observe that almost all the 
sufficient statistics that we have studied thusjar are minimal We have 缺 

noted that the m.Le •泛 of 0 is a functW^fone or more sufficient 
statistics，when the latter exist. Suppose that this mJ.e. 9 is also 
sufficient. Since this sufficient statistic ^ is a function of the other 
sufficient statistics, it must be minima 丄 For example/ we have 

1. The in.Le, 0 = A" of 0 in N(6 y a 2 )^ a 2 known, is a minimal sufficient 
statistic for 

2. The m.l.e. § — X of ^ in a Poisson distribution with mean 0 is a 
minimal sufficient statistic for 6, 

3. The m.le, § ^ Y n = max ( 不） of 0 in the uniform distribution over 
(0, 0) is a minimal sufficient statistic for 9. 

4* The maximum likelihood estimators X and § 2 - S 2 of 6 X and 
in N(0 t , 0 2 ) are joint minimal sufficient statistics for B x and $ 2 . 

From these examples we see that the minimal sufficient statistics do 
not need to be unique, for any one-to-one transformation of them also 
provides minimal sufficient statistics. For illustration, in 4, the Z X, and 
£ If are also minimal sufficient statistics for and 6 2 . 

Example 2. Consider the model given in Example L There we noted that 
Y i - min (X) and Y n = max are joint sufficient statistics. Also, we have 

9-l<Y x <Y n <e+\ 

Y n ^ l < 0 < -f L 

Hence, to maximize the likelihood function so that it equals 8 can be any 
value between Y n — I and Y t + L For example, many statisticians take the 
m*Le. to be the mean of these two end points, namely 


or ， equivalently. 


K 


yi 


11 

+ 

n 


n 


2 


= 
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There is also a relationship between a minimal sufficient statistic 
and completeness that is explained more fully in the 1950 article by 
Lehmann and Scheffe. Let us say simply and without explanation that 
fbr the cases in this book, complete sufficient statistics are minimal 
sufficient statistics. The converse is not true, however, by noting that 
in Example I we have 



for all 0, 


That is, there is a nonzero function of those minimal sufficient 
statistics, Y x and Y ny whose expectation is zero for all 0, 

There are other statistics that almost seem opposites of sufficient 
statistics* That is, while sufficient statistics contain all the information 
about the parameters, these other statistics, called ancillary statistics, 
have distributions free of the parameters and seemingly contain no 
information about those parameters. As an illustration, we know that 
the variance f of a random sample from N(6, l)hasa distribution that 
does not depend upon 0 and hence is an ancillary statistic* Another 
example is the ratio Z = X x l{X } + X 2 \ where X U X 2 is a random 
sample from a gamma distribution with known parameter a > 0 and 
unknown parameter p = Q ，because Z has a beta distribution that is 
free of 0, There are a great number of examples of ancillary statistics, 

and we provide some rules that make them rather easy to find with 
certain models* 1 

First consider the situation in which there is a location parameter/! 土 H 泰 : 
That is, let X x , X 2 ， * 零 ■ t X n be a random sample from a distribution 
that has a p.d.f, of the form f(x — $), for every real 0; that is, 0 is a 
location parameter* Let Z — w(JTj, X 2 , .…， be a statistic such that 

文 2 m m m y Xff ~ 1/( JCI ? X2 f - p 4 4 

p 

# . * * 

for all real d. The one-to*one transformation defined by W f = X t — 6^ 

/ = 1 ， 2, ■ ■ • ，/?， requires that the joint of fV l$ %， .. • ，坏 ^ be 

/(%)/(%)" •/« 

which does not depend upon 9, In addition, we have，because of the 
special functional nature of u(x u . ■ ， x fl )，that 

Z = u{W x + 0,W 2 + 6 y = u{W u W 2 . W n ) 

is a function of W u \ alone (not of 0). Hence Z must have 
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a distribution that does not depend upon 0 because, for illustration 
the m.g.f* of Z, namely ' 


E{e a ) 


/ *00 


^*00 


1 ， • • 《 

Kir 0) … 'f{x n - 

^ 0) dx x 


*/(^) dw } 

•… dw„ 


dx n 


□0 




is free of 0. We call Z, — X 2 J * - •, 2 locuiiflnyinvariant statistic^ 

We immediately see that we can construct many examples of c^Mnu 及 
location-invariant statistics: the sample variance = S 2 , the sample 
range = Y n - K,, the mean deviation from the sample median = 

(l//i)2 } 岑一 median (X,)|, TT+IT- ^3 - X x + X 3 - 2X^ 

(l/«) X [Xi— min and so on. 

We now consider a scale-invariant statistic ‘ Let , X 2 ,X n he 

a random sample from a distribution that has a p.d.f of the form 

for all 6>0; that is, 9 is a scale parameter. Say that 
Z = u{X u X 2 , _ ， X n ) is a statistic such that 


u(cx } , cx 2 


dn) = x 2^ . " ， A： n ) 


for all c > 0. The one-to-one transformation defined by W t = 

/ = 1，2,…，， requires the following: (!) that the joint p_di. of 
W u W l7 …，％ be equal to 

伽*)/ (矽 2 ) … u 

ii 

and (2) that the statistic Z be equal to 

z- u{QW u 0%，…， ew n )^ u{W u %”-•，％)• 

Since neither the joint p.df of W 2 , … ， tV n norZ contain 0, the 
distribution of Z must not depend upon 9. There are also many ex¬ 
amples of scale-invariant statistics like this; Z : X i /(X l + X 2 ), X]jY X 2 \ 

■•r V |< 

min (JQ/max (XX and so on. 1 

Finally, the location and the scale parameters can be combined in 
a p.dX of the form ~ OMl - 00 < A < 00 , 0 < < oo . 

Through a one-to-one transformation defined by tV. — Oi)/0 2 , 
i ^ 1 ， 2,.. •, rt，it is easy to show that a statistic Z ^ u(X l ,X 2 X) 
such that 


w(cx_ + d ， … ， cx„ + d ) 二 u(x t> … ， a) 
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for —oo < d < oo, 0 < c < oo.hasa distribution that does not depend 
upon d x and 0 2 , Statistics like this Z = u{X u X 2 , ••” 毛） arcipcatm^ 
and-scale-invariant statistics^ Again there are many exam pies r 


微: t 娜石 (‘_ 變， 吼-耿 A — 興 


Thus these location-invariant, scale-invariant, and location- 
and-scale-invariant statistics provide good illustrations，with the 

model for the p.dX，of ancillary statistics. Since an 
ancillary statistic and a complete (nuinimal) sufficient statistic are such 
opposites, we might believe that there is, in some sense, no relationship 
between the two. This is true and in the next section we show that they 
are independent statistics. 


EXERCISES 

7,53. Let 尤， Jf 2 , •”， ^ be a random sample from each of the following 
distributions involving the parameter 0, In each case find the m,l.e. of 0 and 

show that it is a sufficient statistic for 0 and hence a minimal sufficient 
statistic. 

(a) 办 (1 ， 0), where 0 <0 < U 

(b) Poisson with mean 6 > 0. 

(c) Gamma with a = 3 and p = 9 > 0. 

(d) N(9 9 1), where - oo <9 < oo. 

(e) N(Q ，&)、where 0 <9 < co. 

7.54* Let F, < F 2 < … < Y„ be the order statistics of a random sample of 
size from the uniform distribution over the closed interval \—0. 8] having 
口.<11/(;^0) = (1/2的/卜啊(办 

( a ) Show that Y y and Y n are joint sufficient statistics for 6. 

(b) Argue that the m.U. of 0 equals § = max (-Y u Y n ). 

(c) Demonstrate that the mJ.e. ^ is a sufficient statistic for 0 and thus is 
a minimal sufficient statistic for 8. 

7.55. Let Y l < Y 2 < * * * < Y^he the order statistics of a random sample of 
size rt from a distribution with p,dX 

/(n w = (是 ) r ('- u ， 

where — oo < 0, < oo and 0 < 0 2 < oo. Find joint minimal sufficient 
statistics for 0, and 0 2 . 
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sam P Ies from 咖 h of the distributions given in Exercises 
A ' c ' 7# ^ 5j define at least two ancillary statistics that are 

different from the examples given in the text. These examples illustrate, 

respectively, location-invariant, scale-invariant，and location-and-scale- 
invariant statistics* 


7.9 Sufficiency, Completeness，and Independence 


We have noted that if we have a sufficient statistic K| for a 

parameter 0, 6eQ, then 蝴乃 ) ， the conditional p.d.f. of another 

statistic Z, given Y x =y u does not depend upon 0. If, moreover, 7, and 

fare independent, the p.d.f ： g 2 (z) ofZis such that 幻 (z) = h(z\ yi ), and 

ence g 2 (z) must not depend upon 6 either* So the independence of a 

statistic Z and the sufficient statistic for a parameter 0 means that 

the distribution of Z does not depend upon 0 e a That is，Z is an 
ancillary statistic. 


It is interesting to investigate a converse of that property. Suppose 
$at the distribution of an ancillary statistic Z does not depend upon 
th ⑶， arc Z and the sufficient statistic Y, for 0 independent? To begin 
our search for the answer, we know that the joint p.d,f. of r, and Z 
St (yil G)H z \yi), where 0) and h(z\y l ) represent the marginal 

of Y, and the conditional p.dX of Z given Y } ^y u respectively 
Thus the marginal p.d.f. of Z is 


Stiyil 0)h{z\y x ) dy x — g 2 (z), 

CO 

which，by hypothesis, does not depend upon 0* Because 

j Si{^)g\{yu Q) dy x = g 2 (zX 

it follows, by taking the difference of the last two integrals, that 

产 00 

[ 心⑻ - K^\yi)]gi(yu 9)dy { = o (l) 

, v —CO ' 

fbr all 0eQ. Since 7* is a sufficient statistic for 6, 啦 | 少 【） does not 
depend upon 8, By assumption ， g 2 (z) and hence g 2 (z) — h(z\ y] ) do not 

depend upon 0. Now if the family ： 6 eQ] is complete, 

Equation (I) would require that 

■ ■ 

Si{^) ~ h{z\y { ) 0 or &( 2 ) =： h{z\y^) r 
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That is, the joint p,dX of Y { and Z must be equal to 』 

giiyii ^g\{y\i 0)g 2 (zy 

* B . 

Accordingly, Y } and Z are independent, and we have proved the 
following theorem，which was considered in special cases by Neyman 
and Hogg and proved in general by Basu- 

Theorem 6. Let X u X 2y •.., denote a random sample from a 
distribution having a p.dX f(x; 9), 9 eQ, where Q is an interval set. Let 
Y { = u { (X t 3 X 2 , … , X n ) be a sufficient statistic for ❹， and let the family 
iSiiyil &) ： & ^ ofprobability density functions of Y、be complete. Let 
Z = u{X x ^ …， D be any other statistic (not a function of Yi alone). 
If the distribution of Z does not depend upon G，then Z is independent of 
the sufficient statistic Y } , 

In the discussion above，it is interesting to observe that if is a 
sufficient statistic for 0， then the independence of Y x and Z implies 
that the distribution of Z does not depend upon 6 whether 
{gi(y\l ： 0€Q] is or is not complete. However, in the converse, to 
prove the independence from the fact that g 2 (z) does not depend upon 
0， we definitely need the completeness. Accordingly, if we are dealing 
with situations in which we know that the family {g(yi;0): 6 eQ} is 
complete (such as a regular case of the exponential class)，we can say 
that the statistic Z is independent of the sufficient statistic Y x if 5 and 
only if，the distribution of Z does not depend upon 0 (Le.，Z is an 
ancillary statistic). 

It should be remarked that the theorem (including the special 
formulation of it for regular cases of the exponential class) extends 
immediately to probability density functions that involve m parameters 
for which there exist m joint sufficient statistics. For example, let 

X 2 ^ …， 尤 be a random sample from a distribution having the 
p_d.f./(x; 0 2 ) that represents a regular case of the exponential class 

such that there are two joint complete sufficient statistics for 仏 and 0 2 . 
Then any other statistic Z = u(X u •. • ， X n ) is independent of the 
joint complete sufficient statistics if and only if the distribution of Z 
does not depend upon 0 X or 0 2 - 

We jgive an example of the theorem that provides an alternative 
proof of the independence of X and S 2 , the mean and the variance of 
a random sample of size n from a distribution that is NQi ， <r 2 ). This 
proof is presented as if we did not know that nS 2 /a 2 is x\ n w 0 
because that fact and the independence were established in the 
same argument (see Section 4.8), 
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Example t Let X U X 2 , …， X n denote a random sample of size n from a 
distribution that is N(/x, a 2 ) t We know that the mean X of the sample is, for 
every known tr 3 ，a complete sufficient statistic for the parameter 〆 ， 
— oo </< < oo. Consider the statistic ， ’ 


ij 


n 


I (d 


_ 和— 

which is location-invariant. Thus S 2 must have a distribution that does not 
depend upon /i; and hence，by the theorem, £ind A^, the complete sufficient 
statistic for #， are independent. 

■ Exumple 2* Let , X n be a random sample of size n from the 

distribution having p.d ， f. , 


fix ； 9) - 0<x<ao, —oo<0< 

— 0 elsewhere. 


oo 


Here the p.d.f* is of the form f(x — 0\ where fi^x) = €~ x i 0 < x < oo, zero 
elsewhere. Moreover, wc know (Exercise 7,27) that the first order statistic 
f = min (X f ) is a complete sufficient statistic for 0, Hence Fj must be 
independent of each location-invariant statistic u(X x , X 2y … enjoying 
the property that 




u i x \ + 4 4 4A + 4 = u(x u x 2 , 




for all real d Illustrations of such statistics are 5^, the sample range, and 


n 


n 


X PC— min (X f )l 


2 from a 


■售 

Example 5. Let X u X 2 denote a random sample of size n - 
distribution with p.dX 

罈 h 

*■ k 

/(■X; 0 )^^ e- xj \ 0 < JC < 00 ， 0 < 0 < 00 ， 

= 0 elsewhere. 

The p*d.f- is of the form (Iwhere — c~ x ^ 0 < x < oo, zero 

elsewhere. We know (Section 7,5) that Y { = X { + JT 2 is a complete sufficient 

statistic for 0. Hence is independent of every scale-invariant statistic 

u(X u X 2 ) with the property u(cx H cx 2 ) = u(x u x 2 y Illustrations of these are 

HX: and X l j(X l + X 2 \ statistics that have F and beta distributions, 
respectively. 

Example 4. Let X u X 2j .. *, X n - denote a random sample from a 
distribution that is N(6 i3 0 2 )? — oo < < oo, 0 < 02 < oo* In Example 2, 
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Section 7.7, it was proved that the mean X and the variance S 2 of the sample 
are joint complete sufficient statistics for and Consider the statistic 

E (^7+1 — X) 2 

Z — ~ n - = w(A\，, D, 

I - X) 2 

l 

which satisfies the property that u(cx l -f , cx n + d) = w(a ， • ， • ， x n ). 
That is, the ancillary statistic Z is independent of both X and S 1 . 

Let N(6 { , 0 3 ) and N(6 2 , 0 4 ) denote two normal distributions. Recall 
that in Example 2, Section 6.5, a statistic, which was denoted by T, was 
used to test the hypothesis that = 8 2 ^ provided that the unknown 
variances 6 3 and 0 4 were equal. The hypothesis that 9 X = 0 2 is rejected 
if the computed |71 > c, where the constant c is selected so that 
a 2 = Pr (\T\>c; 8 } = 0 2 , 0 3 = 0 A ) is the assigned significance level of 
the test. We shall show that, if = 0 4 , f of Exercise 6,52 and T are 
independent. Among other things, this means that if these two tests 
based on F and T, respectively, are performed sequentially, with 
significance levels a { and oe 2 , the probability of accepting both these 
hypotheses, when they are true, is (1 — ai)(l — a 2 )* Thus the 
significance level of this joint test is oc = 1 — (1 — 心（1 一 a 2 ). 

The independence of fand T, when — 8 4 , can be established by 
an appeal to sufficiency and completeness. The three statistics A", Y y and 

£ (Xf - JP) 2 + I - Y) 2 are joint complete sufficient statistics for 

the three parameters 6 2 ^ and = 心 Obviously, the distribution of 
fdoes not depend upon U 2 , or = 0 4 , and hence Fis independent 
of the three joint complete sufficient statistics. However, Fis a function 
of these three joint complete sufficient statistics alone, and， 
accordingly^ 7" is independent of F. It is important to note that these 
two statistics are independent whether 0, — 0 2 or ^ 0 2 ^ This permits 
us to calculate probabilities other than the significance level of the test. 
For example, if — 8 4 and 0 X ^ 0 2 t then 

Pr (C| < F < [71 > r) — Pr (cj < F < c 2 ) Pr (|71 > c). 

The second factor in the right-hand member is evaluated by using the 
probabilities for what is called a noncentral /-distribution. Of course, 
if = 0 4 and the difference B { - 0 2 is large, we would want the 
preceding probability to be ciose to I because the event 
{r, < F< c 2 ,\T\> c } leads to a correct decision, namely accept 0 3 = 0 4 

and reject = 0 2 . 
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In this section we have given several examples in which the complete 
sufficient statistics are independent of ancillary statistics. Thus, in 
those cases, the ancillary statistics provide no information about 
the parameters. However, if the sufficient statistics are not complete， 
the ancillary statistics could provide some information as the following 
example demonstrates. 

Examples. We refer back to Examples I and 2 of Section 7.8 - There the 
first and nth order statistics, Y { and Y n , were minimal sufficient statistics for 
^ where the sample arose from an underlying distribution having p,dX 

(5)’(y + i)W* Often F, = (Y t + Y n )j2 is used as an estimator of 0 as it is a 
function of those sufficient statistics which is unbiased. Let us find a 
relationship between T t and the ancillary statistic T 2 — 

The joint p,d.f. of Y l and Y rt is n 

g(yuyn ； 0)^n{n^ 1)(^ - 力广 2 /2' e-i<y ]<yfi< e+i 

zero elsewhere. Accordingly, the joint p.dX of ^ and r 2 is, since the absolute 
value of the Jacobian equals I, 

hi ~ w(» — I)’2 _2 /2 n ， -j, 0 < / 2 < 2, 

zero elsewhere. Thus the p.di of is 

* 2 ^ 2 ； 0) - n(n ^ l)/r 2 (2 - h)I2\ 0 < / 2 < 2, 

zero elsewhere, which of course is free ofd^sT 2 is an ancillary statistic Thus 
the conditional p ， d.f. of 7；, given T 2 = t 2> is 

h\^(h\h\ ~ 2 11 ， 0 一 1 + 孑 < /「< 0 + 1 一会 ， 0 < / 2 < 2, 

^ 二 l* 

zero elsewhere. Note that this js iinifonn on the interval (0 — 1 + 

6 + 1 — h\7)\ so the conditional mean and variance of r, are, respectively: 

£(^\( 2 ) = e and var= — : 广 ) . 

That is，given T 2 - t 2i we know something about the conditional variance of 
TV. In particular, if that observed value of T 2 is large (close to 2)，that variance 
is small and we can place more reliance on the estimator T { . On the other 
hand, a small value of means that we have less confidence in as 
an estimator of 9. It is extremely interesting to note that this conditional 
variance does not depend upon the sample size n but only on the given value 
of T 2 = t 2 * Of course, as the sample size increases, T 2 tends to became larger 
and，in those cases, T x has smaller conditional variance 
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While Example 5 is a special one demonstrating mathematically 
that an ancillary statistic can provide some help in point estimation, 

’his does actually happen in practice too. For illustration, we know that 
if the sample size is large enough, then 

^ X~fi 

T- - : ~^ 

has an approximate standard normal distribution. Of course, if the 
sample arises from a normal distribution, X and S are independent and 
rjias a ^distribution with n — 1 degrees of freedom. Even if the sample 
arises from a symmetric distribution, X and S are uncorrelated and T 
has an approximate /-distribution ahd certainly an approximate 
standard normal distribution with sample sizes around 30 or 40. On 
the other hand, if the sample arises from a highly skewed distribution 
(say to the right)，then X and S are highly correlated and the probability 
Pr (— 1,96 < r< 1*96) is not necessarily close to 0*95 unless the 
sample size is extremely large (certainly much greater than 30)* 
Intuitively, one can understand why this correlation exists if the 
underlying distribution is highly skewed to the right. While S has a 
distribution free of fi (and hence is an ancillary), a large value of S 
implies a large value of X, since the underlying pAS. is like the one 
depicted in Figure 7」 ■ Of course, a small value of X (say less than the 
mode) requires a relatively small value of S. This means that unless n 
is extremely large, it is risky to say that 


/ x — 一^， I 4 - 

- i 

provides an approximate 95 percent confidence interval with data from 
a very skewed d—tribution. As a matter of fact, the authors have seen 
situations in which this confidence coefficient is closer to 70 percent， 
rather than 95 percent, with sample sizes of 30 to 40. 



FIGURE 7.1 


爹 jr 
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EXERCISES 


7.57. Let Y t < Y 2 < Y 3 < Y 4 denote the order statistics of a random sample 
of size ；! = 4 from a distribution having pAZf(x; 6) = l/$, 0<x <0 7 zero 
elsewhere, where 0 < 0 < co. Argue that the complete sufficient statistic 
for 6 is independent of each of the statistics Y l /Y 4 md(Y i + 1^)/( L 十 Ya 
H int: Show that the p.dX is of the form (1/0 )/(jc/ 0), where/(x) = 1, 
0 < x < I, zero elsewhere, 

7.58* Let Kj < y 2 < — - < Y n be the order statistics of a random sample from 
the normal distribution 卿， o 2 )，一 oo < 0 < oo. Show that the distribution 

of Z = Y„ — Y does not depend upon 0, Thus F = [ a complete 

sufficient statistic for 0， is independent of Z. ， 

J ■ ■ 

a, * j 

• * ^ 

7-59- Let X 〗，不 ” ，■ ， be a random sample from the normal distribution 
iV(0 ， cr 2 )，一 oo < 9 < oo. Prove thata necessary and sufficient condition that 

the statistics Z = ^ a t Xi and Y X h a complete sufficient statistic for 9 $ 
be independent is that ^ a ； = 0. 

i ^ 

■ 鳄 纏 _ * 

7_60. Let X and r be random variables such that £(J^) and #0 exist 
for 众 = 1 ， 2, 3” " ■ If the ratio Xj Y and its denominator Y are independent, 
prove that E[{Xim - £(^)/£(^), A: = I ， 2, 3, ■… 

Hint: Write 取 ) = El^iXjY)% 

- - ：， 、：， 

1 . + 

7,6L Let Y l < Y 2 < * * < Y n be the order statistics of a random sample of 
size n from a distribution that has p,d.f. f(x; $) = (l/0)e— 硝 ， 0 < x < oo, 

0 < 0 < oo, zero elsewhere. Show that the ratio ^ and its 

denominator (a complete sufficient statistic for 0) are independent. Use the 
result of the preceding exercise to determine E(R k ), = 1, 2, 3, •，… 

7.62. L^tX ly X 2j .•” be a random sample of size 5 from the distribution 

that has p.df, f(x) — e~\ 0 < x < oo, zero elsewhere. Show that 

(A"! + ^ 2 )/(^ + + … + X s ) and its denominator are independent. 

Hint: The p.d.f. f{x) is a member of {f(x; 6):0 <6 < 00 }, where 

f(x ； 0) = (1 難 ，， 0 < x < oo, zero elsewhere. 

■ * 

醫 

7,63， Let Y t < Y 2 < — < Y n be the order statistics of a random sample from 
the normal distribution N(6 U 0 2 ), -00 <_0 { < 00 , D < 0 2 < Show that 

the joint complete sufficient statistics X =and S 2 for 6 { and 0 2 are 
independent of each of (Y n = Y)/S and {Y n - Y x )/5. 
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7*64, Let < K 2 < ■ ** < be the order statistics of a random sample from 
a distribution with the p.d.f* 


f(x; 6 lt 0 2 ) = exp f 


X 一 


❹ 1 


❺ 1 <A 〈 °°， 2ero e 】 sew here，where —oo < $ < oo, 0 <$ 2 < oo. Show that 
the joint complete sufficient statistics F, and JP = K for Q, and B 2 are 

independent of (Y 2 - u/f (r r - [)• 


7.65. LetXi,JT 2 ,. " ， be a random sample of size n — 5 from the normal 
distribution N(0 y 0), 

(a) Argue that the ratio R^(X\ + 均/⑺ + … + g) and its denomi- 
nator ⑻ + …+ are independent. 

(b) Does 5R/2 have an ^distribution with 2 and 5 degrees of freedom? 
Explain your answer. 

⑼ Compute £(7f) using Exercise 7,60. 

7*66. Let [ < < … < & & the order statistics of a random sample of 

size « from a distribution having p.dX , 

f(x; 8) = (1/0) exp 0 < jc < oo, 

and equal zero elsewhere, where G < 0 < oo_ Show that W and 

Z — n F] K are independent. Find k — 1 ， 2, 3,… using the result 
of Exercise 7,60, What is the distribution of 2? 

7.67* Referring to Example 5 of this section, determine c so that 

Pr (-c<T x ^e<€\T 2 = t 2 )^ 0.95. 

Use this result to find a 95 percent confidence interval for 0 y given T 2 — t 2 \ 
and note how its length is smaller when the range t 2 is larger. 


ADDITIONAL EXERCISES 


./r_ ^ ”尤 be a random sample from a distribution with p.d_f. 

⑻ What is the complete sufficient statistic, say F, for ffl 
(b) What function of Y is an unbiased estimator of 07 

■X ► 

7*69. Let Fj < Y 2 & be the order statistics of a random sample 

of size n from a distribution with p.d.f. f{x\ 6) = 1/0, 0 < x < 9, zero 
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el^where. The statistic Y n is a complete sufficient statistic for 0 and it has 


g(yn\ 0) 




e n 


o<y n <e y 


and zero elsewhere. 

⑻ Find the distribution function H n (z; 9) of Z n(6 - Y) 

(b) Find the Jim H^z; 0) and thus the limiting distribution of Z. 


7*70.^xt X,,,,. t X n ; Y u 9 Y tt ; Z,” ■ ” A be respective independent 
random samples from three normal, distributions JV(/ 4 , = a + a 2 ) 

^V(/i 2 = ^ + y, of), N(p 3 ^a + y, a 2 ). Find a point estimator for ^ that is 
based on X, K, Z Is this estimator unique? Why? If o 2 is unknown, explain 
now to find a confidence interval for 

7.71: Let H 2 ， … ， A ； be a random sample from a Poisson distribution 
with mean 0. Find the conditional expectation E(X } + 2X 2 + 3 不 | 左不） 

7.72. Let be a random sample of size n from the eormal 

distribution N(0, 1). Find the unbiased minimum variance estimator of 0 2 . 

7.73. Let X u : • ， ， Jt ； be a random sample from a Poisson distribution 
with mean 0. Find the unbiased minimum variance estimator of 8 l . 

7.74 We consider a random sample 义， 屯 … ， JT„ from a distribution with 

= (l}8)Qxp(-x/d), 0<x< oo 9 zero elsewhere, where 0 < 6. 

we only observe the first r order 

statistics, Y { <Y 2 <- < Y r . 

(a) Record the joint pAI. of these order statistics and denote it by L(8). 

(b) Under these conditions, find the m Le” 艮 by maximizing L(6), 

(c) Find the m.g.f. and p.d,f. of 焱 

(d) With a slight extension of the definition of sufficiency, is sufficient 

statistic? , 

C e ) Find the unbiased minimum variance estimator for 6. 

(f) Show that and § are independent, 

7.75. Let us repeat Bernoulli trials with parameter 9 until k successes occur. 

If Y is the number of trials needed: 

⑻ Show that the p.d.f. of Y is g(y; 9) = jV(I ~6y- k , y = k, 

众 + 1 ， " * ， zero elsewhere, where 0 <B< L 

(b) Prove that this family of probability density functions is complete 

(c) Demonstrate that E[(k- l)/(Y- 1)] = 0. 

(d) Is it possible to find another statistic, which is a function of Y alone 

that is unbiased? Why? , 
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lml f ； -^ X nbea random sample from a distribution with p.dX 

f(x; &) = ^(1 一 0 )， = 0, 1 ， 2, … ， zero elsewhere, where 0 <0< 1 

(a) Find the mle, ， 0， of 6. 

n 

(b) Show that ^ is a complete sufficient statistic for 9 

i 

(c) Determine the unbiased minimum variance estimator of 6. 

’ 1 ^ * s a ran dom sample from a distribution with p.d.f. 
y(x; \ 0 < jc < oo s zero elsewhere, where 0<d < oo: 

⑻ Find the mJ.e. ， 成 of 0. Is 沒 unbiased? 

■ • ^ ' * 

Hint: First find the p.dX of K = £ ^ and then compute E0). 

I == I 

(b) Argue that Y is a complete sufficient statistic far 0. 

(c) Find the unbiased minimum variance estimator of 6. 

(d) Show that X { jY and V are independent. 

(e) What is the distribution of XJY1 


CHAPTER 


More About 

_ + 

Estimation 


8.1 Bayesian Estimation 

In Chapter 6 we introduced point and interval estimation for 
various parameters. In Chapter 7 we observed how such inferences 
should be based upon sufficient statistics for the parameters if they 
exist. In this chapter we introduce other concepts related to estimation 
and begin this by considering Bayesian estimates, which are also based 
upon sufficient statistics if the latter exist 

In introducing the interesting and sometimes controversial 
Bayesian method of estimation, the student should constantly keep 
in mind that making statistical inferences from the data does not 
strictly follow a mathematical approach. Clearly, up to now，we have 
had to construct models before we have been able to make such 
inferences. These models are subjective, and the resulting inference 
depends greatly on the model selected. For illustration, two statis¬ 
ticians could very well select different models for exactly the same 
situation and make different inferences with exactly the same data. 
Most statisticians would use some type of model diagnostics to see if 
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the models seem to be reasonable ones, but we must still recognize 
that there can be differences among statisticians’ inferences. 

We shall now describe the Bayesian approach to the problem of 
estimation• This approach takes into account any prior knowledge of 
the experiment that the statistician has and it is one application of a 
principle of statistical inference that may be called Bayesian statistics. 
Consider a random variable X that has a distribution of probability 
that depends upon the symbol 0, where 6 is an element of a well-defined 
set Q. For example, if the symbol 6 is the mean of a normal distribution, 
Q maybe the real line. We have previously looked upon 6 as being some 
constant, although an unknown constant. Let us now introduce a 
random variable 0 that has a distribution of probability over the set 
O; and, just as we look upon x as a possible value of the random 
variable X， we now look upon 0 as a possible value of the random 
variable 0. Thus the distribution of X depends upon 0， an experimental 
value of the random variable ©* We shall denote the p.d.f, of ® by h{9) 
and we take h{6) = 0 when $ is not an element ofn. Moreover, we now 
denote the p.d.f. of I by f(x\0) since we think of it as a conditional p.dX 
of X given 0 = 0, 

Say X x ,X 2 ,. .. is a random sample from this conditional 
distribution of X, Thus we can write the joint conditional p ， d.f* of 
H . •, ，尤 ， given © = 0, as 

• f{x x \Q)f{x 2 \6) - - -f{x tt \e). 

Thus the joint p.d.f. of X u X 2l …，毛 and © is 

g(xi, x 2 ,..., x„, &) = f{x y \e)f(x 2 \e) - - ^ f{x„\&)h{ey 

If © is a random variable of the continuous type, the joint marginal 
of X li X 2 , _, X n is given by 

S\ C-^l ^ ■… ， Xff) = ，乂 2, * * * T 


If © is a random variable of the discrete type, integration would be 
replaced by summation. In either case the conditional p.d.f. of 0, given 
j s .. *, is 


蝴 X ! ， A ，， 


Xft ^ gi(Xt,X 2 , , .. , x„) 


/Qc 撕调 … 

义2， ‘ ■ * ， ^ i ) 


This relationship is another form of Bayes" formula. 
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Example L Let X l3 .. be a random sample from a Poisson 
distribution with mean 0， where 0 is the observed value of a random variable 
© having a gamma distribution with known parameters a and f Thus 

容 (A J * * - > Xfjy 0) - 



0^e 一 6 - 




L r ⑻浐 J 


provided tliatjc, = 0, 1 ， 2, 3, 
to zero elsewhere. Then 

^ l(^l ， * ■ _ ，叉 ") 


1 ， 2, .. • ， 《 and Q < 0 < oo, and is equal 




l e -(u + 聊 


々！… r ( o # 


de 


n 


Z^ + a 


a ! … a ! r ⑻帅 + i / 多产 +«• 

Finally, the conditional p.dX of 0, given X x = x u …， is 

咖， …， xj) 


糊工 i ， # « A f X„) 


^t •“ ， ^n) 



+ 阳)]一 

provided that 0 < ^ < oo, and is equal to zero elsewhere. This conditional 
p*dX is one of the gamma type with parameters a* — £ x. + a and 
fi m ^= 街⑽十 1), 

* 

In Example 1 it is extremely convenient to notice that it is not really 
necessary to determine 奶 ( 々 ， . … ， :c„) to find k(0\x } ， • •. ，々 )• If we 
divide 

/ 卜携 / ㈤ 0) …/ ⑻咖⑼ 

by gi (x { ，.■” x n )^ we must get the product of a factor, which depends 
upon x u ., .,x n but does not depend upon 6, say c(x } , - • • ， x n ) y and 

沒 E xi + <x^ l e —$/mnp + l>] ■ 

That is, , 

蝴 |x_ ，… * x„) = c(x,, … ， x n )0z Xi “ - 糖⑽ + "i ， 

provided that 0 < 0 < oo and 心 = 0 ， 1 ， 2, •…， / 笃 1 ， 2, . ，•，/ l 

However, c(x u . . must be that “constant” needed to make 
增 | 文卜 .• ■ ， x n ) a p-d.f” namely 






r I >, + a ■ 邱 + 1 ) 产 


a 
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Accordingly, Bayesian statisticians frequently write that 
啡 l x t ， …， is proportional to 

咖， 欠2,…， A ，0>; 

that is ， 

, … ， A) oc/(x,|0) … f(x n \d)h(dy 

a 

Note that in the right-hand member of this expression all factors 
involving constants aod 々 ，…， Xft alone (not 0) can be dropped. For 
illustration, in solving the problem presented in Example 1, the 
Bayesian statistician would simply write 

蜃 

籲 _ % . 

k{6\x x? …， oc 〜 - 言 - 

or ， equivalently, * 

， 众(啦 I ，…， A) oc 户， 

_ - _ 9 > 

0 < 0 < oo and is equal to zero elsewhere. Clearly, k(e\x l9 . • - ， 
must be a gamma p+df with parameters a* = S x + a and 
^ ™ + 1 ). 

There is another observation that can be made at this point. 
Suppose that there exists a sufficient statistic Y u{X x ，… .， A；) for the 

parameter so that 

* w 

: H 

/( 々 |0) … ^f(xjd) - 咖 (h … ， x n )\Q]H{x u .,., JC,), 

* 、 T ， * A . • « 

where now g{y\6) is the p.dX of F，given © = 0, Then we note that 

峨々， … ， a) oc g[u(x {9 … ， x n )\e\h(6) 

because the factor H{x u •••，〜）that does not depend upon 0 can be 
dropped. Thus，if a sufficient stEtistic Y for the parameter exists , wc can 
begin with the p.d*f, of Y if we wish and write 

: 蝴 J) oc g(y\6)h(e), 

► ■ * 4 

where now k(6\y) is the conditional p dX of ©，given the sufficient 
statists Y-y. The following discussion assumes that a sufficient 
statistic Y does exist; but more generally, we could replace Y by 

X u in what follows. Also, we now use ^ r (j；) to be the 

marginal p.d.f, of Y; that is, in the continuous case, 

， * '■ 1 Aoo r . 

:_ gliy) - .! g(y\O)h(0) de. 
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In Bayesian statistics, the p.di A(0) is called the prior pM.f. 
of ©，and the conditional p.d.f_ k(0\y) is called the posterior 
p ， df ， of(SK This is because h{9) is the p,d,f. of© prior to the observation 
of Y ,whereas k(9\y) is the p.d.f of © after the observation of 7 has been 
made. In many instances, h(0) is not known; yet the choice of h{Q) 
affects the pAS. k{e\y). In these instances the statistician takes into 
account all prior knowledge of the experiment and assigns the prior 

p.dX k(0). This, of course, injects the problem of personal or subjective 
probability (see the Remark, Section 1.1). 

Suppose that we want a point estimate of 0. From the Bayesian 
viewpoint, this really amounts to selecting a decision function 8, so that 
占 ( 少 ) is a predicted value of 9 (an experimental value of the random 
variable ©) when both the computed value and the conditional p.dX 
K^\y) are known. Now，in general, how would we predict an 
experimental value of any randoin variable，say 阶 ， if we want our 
prediction to be “reasonably dose” to the value to be observed? Many 
statisticians would predict the mean, E( W )，of the distribution of W; 
others would predict a median (perhaps unique) of the distribution of 
fF; some would predict a mode (perhaps unique) of the distribution of 

孕 nd some would have other predictions. However, it scenis 
desirable that the choice of the decision function should depend upon 
the loss function <e[6, 6{y)l One way in which this dependence upon 
the loss function can be reflected is to select the decision function ^ in 
such a way that the conditional expectation of the loss is a mininiuni. 
A Bayes" solution is a decision function S that minimizes 


对义 [© ， 5(y)]\ Y^y}^ { 义 [0 ， S(y)]k(e\y) d6, 


if © is a random variable of the continuous type. The usual 
modification of the right-hand member of this equation is made for 
random variables of the discrete type. If，for example, the loss function 
is given by 义 [0 ， S(y)] = [6 — S(y)] 2 1f the Bayes , solution is given by 
<5( 少） = E(&\y), the mean of the conditional distribution of ©, given 
Y = This follows from the fact that — A)?], if it exists, is 
a rninimurn when b ~ E{ W), If thfe loss function is given by 
义 [ 沒， S(y)] ^ S(y)l then a median of the conditional distribution 
of ©，given Y = y/\s the Bayes’ solution. This follows from the fact 
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that Ei\W — b\), ifit exists，is a minimum when b is equal to any median 
of the distribution of W. 

The conditional expectation of the loss, given Y — defines a 
random variable that is a function of the statistic Y. The expected value 
of that function of Y, in the notation of this section, is given by 


义队 S(y)]k(e\y)de} g] (y)dy 


no, HyMym dy \h(6) dQ y 


00 


in the continuous case. The integral within the braces in the latter 
expression is，for every given 0 e ft，the risk function R(6, S); 
accordingly, the latter expression is the mean value of the risk，or the 
expected risk. Because a Bayes 1 solution minimizes 


观 S(y))k(e\y) dO 


for every y for which g { (y) > 0, it is evident that a Bayes' solution S(y) 
minimizes this mean value of the risk，We now give an illustrative 
example. 

Example 2. Let …，尤 denote a random sample from a 

distribution that is A(l ， 0)，0 < 0 < 1. We seek a decision function <5 that is a 

If 

Bayes’ solution* The sufficient statistic Y and Y is b(n ， 0). That is, the 

I 

conditional p.d.f, of Y, given © = 0， is 





胪 (i 7 = 0, 1, 


n. 


0 


elsewhere. 


¥ 」 ■ ， i 

We take the prior p-d.f. of the random variable 0 to be 

_ 卜 r ( a + ^V -， (i o<e<u 

"r ⑻ ros) 

= 0 elsewhere. 

■ 

where a and p are assigned positive constants* Thus the conditional p_dX of 
© 5 given Y — y, is, at points of positive probability density, 


k(0\y) oc 供 (I 一 8)" 驴 —\l 一时 


0 <0 < 1. 
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That is, 

KO\y) 




r(n + a-h fi) 


F(oc + y)F(n + p — y) 


0 <e<u 


and 少 = 0, 1 , • " ， 几 We take the loss function to be 义【 0 ， S(y)] = [9 — 3(y)] 2 , 
Because F is a random variable of the discrete type, whereas @ is of the 
continuous type, we have for the expected risk ， 

Z [o - ^(^)] 2 ^V(i - ey-Ah{Q) de 






[e - w] 2 喻 ) 斗心 ) . 


The Bayes’ solution <5<» is the mean of the conditional distribution of©，given 
Y — y. Thus ' 

d(y)= 6k(e\y) dd 


T(n + a + jg) f 
r(a + y)r(n + ^ - y) J Q 




ol+ y 
oc + + n * 

This decision function S(y) minimizes 


矽一 ^(y)?k(W d6 

for 少 = 0, K •. ■ ， n and, accordingly, it minimizes the expected risk. It is very 
instructive to note that this Bayes' solution can be written as 




\ct + ^ + nj ft \oc + 0 + nJa + P 


which is a weighted average of the maximum likelihood estimate yjn of 8 and 
the mean af(a + of the prior p,d.f. of the parameter. Moreover, the 
respective weights are n/(cc + ^ + n) and (a + ^S)/(a + ^ + n)* Thus we see that 
oc Biid p should be selected so that not only is cc/{ct j?) the desired prior mean, 
but the sum a + p indicates the worth of the prior opinion, relative to a sample 
of size n. That is, if we want our prior opinion to have as much weight as a 
sample size of 20, wc would talce a + = 20. So if our prior mtean is we have 

that a and J? are selected so that a — 15 and 办 = 5. 


Example 3. Suppose that Y ^ X\ the sufficient statistic, is the mean of a 
random sample of size n that arises from the normal distribution N(9, tr 2 % 
where u 1 is known. Then g(yl&) is N(S ， o 2 jn). Further suppose that we 
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fhat Wled8e t0 6 thr ° Ugh a P- P- d f - ^ that is 


k{$\y) cc 


^fhtajy/n 


exp 


(0-Bo) r 
l ^ln) 一 ~ 2trl 


= diminate 311 咖 stant f (including factors involving ^ only), 


we 


K^\y) oc exp 


+ 0^ 2 fn)0~ 


k(0\y) oc exp 


This can be simplified, by completing the square fn a ( r + r . *. 

factors not involving 0) q r to read ( after ehminatin g 

a o + ^ 2 /n 

That is, the posterior p.d.f. of the paranLt^TiU^ 

y<^o + e^ 2 /n / ^^ \ 

^ + ^ 2 in — 


obviously normal with mean 

+ ^ 2 jn / 0 

posteriorm^^^^^ 

of the maximum likelihood estimate^ = xand^ln 83 

and in Example 2 that the Bayes，solution ^ \ 0hservG here 
likelihood estimate as «increases Thus the Lv doser t0 the maximum 
decision maker to enter his or her prior permit the 

=== chthat the influences o ⑽ 1==== n an a <S 


posteriorTdT^^^^^^^^ in , 

tr a rr r i os ; function ‘ = 二 1 z 

l — 叽 the absolute value of the error then the 

thlZr^^ whiclf ^ ^ m h ed t n / the Posterior distribution d 

changes // \h W Z ^ 〒力 ， Hen « the Bayes，solution 
cnanges ， 似 " should, with different loss functions 

If an interval estimate of 6 is desired we can n ^ a r * 

so that the conditional probably n W ° 肋沈⑽ 


Pr [u{y) <© < V (y)\ Y ^ y] 


k(0\y) dd. 


is large, say 0.95, The experimental values of X 2 


K, sa 
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文 I ，义 2 , •…，〜， provide us with an ex«erim^* f t 

the interval to is an ^ ^ 

For illustration, in Example 3 H f mterval is e _ 如 0,95. 
parameter was normal, the imerval, whose 上 ⑽ ⑽ r p d f . 0f the 

taking the mean of that distribution and addi^ P °； nts 1 are fou nd by 
of its standard deviation, addl ng and subtracting L96 

* a . 

a l + O 2 /^ 

Posterior probability of 0.95 


serves as an 


y(^ + 0 o <r 2 /n 

~^T^r ±L96 

interval estimate for $ with 


EXERCISES 


? the prior p.d.f. of © onTwhfpSam that iS 

8.2. LGtXi t X 2f •. ” denote a random 

卿， 〆)，i < 0 < 00 , where is a given^o v m & distribution that i s 
… the mean of the random sample Take ? 仙她灯 . Let K = 尤 

^ d(y)] ^ \e ^ 5 {y) V ueisan observed valuedti° SS f 】 nctlon to 以 
that !s N(fi ， t% where t 2 > 0 and a are known random enable 0 
solution S(y) for a point estimate of 6, numbers, find the Bayes' 

， 售 * B 9 

8.3. LctX } ， : T 2 ， … ， A ； denote a random sample fm™ n - 心 

with mean M d < oo 山 t r = ^ an ^ 〜咖 d — 

ma M v\i ^ IQ ^ Y 12 t 1 ' 咖 the loss function to be 

? <5(^)] = [0 — S(y)]\ Let 0 be an observed vaIr.u 

If © has the p.d.f. h(0)= 伊 - 'e，/rOxW j0 ° fthe random variable 

where a > 0 ^ > 0 are known numbers^nd^he^/f 10 dsewhere ， 
a point estimate of 6, he Ba y es solution <5( 少 ） for 

i > I 

■ 

8.4. Let Y n be the nth order statistic of a random ■ 

distribution with p.dX f( x \0) = 1/0 Q <z x < f) sam P ,e size n from a 

loss function to be 邓， d(y n )] =： [Q ls(y )] 2 l \fl° eisewhere - Take the 

， random variable 0 , which has p.d.f Gbserved ㈣ 此 of 

elsewhere, with ^>0^>0. Find the Bayes 1 solm* ^ < ^^ero 

estimate of 0 m olution S(y n ) for a point 

8.1 Let Y r and Y 2 be statistics that have a trimw i i • 

parameters n, 9, ? and 0 2 . Here 0, and 0 2 are obser^T diStnbution wi£h 

vanables 0, and 0 2 , which have a Dirichlet d = r V h alues of the ^dom 

u wei dls tnbution with known 
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parameters a,, a 2 , and a 3 (see Example I 5 Section 4.5), Show that the 
conditional distribution of ©j and © 2 ^ Dirichlet and determine the 

conditional means 五 (®“y，and E(& 2 \yuyi)^ 

* 

8.6. Let X be N(Q ， I/O). Assume that the unknown 0 is a value of a random 
variable 0 which has a gamma distribution with parameters a = r/2 and 
P = 2/r, where r is a positive integer. Show that X has a marginal 
卜 distribution with r degrees of freedom- This procedure is called 
compounding ， and it may be used by a Bayesian statistician as a way of first 
presenting the /-distribution, as well as other distributions* 

8.7. Let X have a Poisson distribution with parameter 8. Assume that the 
unknown 0 is a value of a random variable © that has a gamma distribution 
with parameters a — r and 芦 = (1 — p)fp i where r is a positive integer and 
0 <p < Show，by the procedure of compounding, that I has a marginal 
distribution which is negative binomial, a distribution that was introduced 
earlier (Section 3*1) under very different assumptions. 

8.8* In Example 2 let n = 30, a = 10, and 芦 = 5 so that S(y) = (10 + 少 )/45 is 
the Bayes’ estimate of 6. 

(a) If Y has the binomial distribution b(30,6), compute the risk 

• 一 5(y>] 2 }. 

(b) Determine those values of 6 for which the risk of part (a) is less than 
0(1 — 0)/30, the risk associated with the maximum likelihood estimator 
Y/n of 9. 

8.9. Let Y 4 be the largest order statistic of a sample of size n — 4 from a 
distribution with uniform p.d.f./(x; 8) = 1/6, 0 < x <9 y zero elsewhere. If 
the prior p.d,f* of the parameter hg(9) — 2j0\ l < 9 < oo, zero elsewhere ? 
find the Bayesian estimator S(Y 4 ) of 6, based upon the sufficient statistic K 4f 
using the loss function \S(y 4 ) — 9\. 

8.10. Consider a random sample . • • ， from the Weibulldistribution 

with p.d,£ f(x; 6, r) — Qtx 卜 ] e~ dx \ 0 < x < oo, where 0 < 0 y 0 < r, zero 
elsewhere. 

(a) If t is known，find the m 丄 e* of 反 

(b) If the parameter 0 has a prior gamma p.df. g(&) with parameters oc and 
0* = t/j?, show that the compound distribution is a Burr type withp.d.f 

— ect^x 1 ~ l j{x x + 於 )* + 、 0 < x < go, zero elsewhere. 

(c) If, in the Burr distribution, t and 芦 are known，find the m.Le. of u based 
on a random sample of size n. 

8.2 Fisher Information and the Rao-Cramer Inequality 

Let A' be a random variable with p.dX f(x; 0), B e Q, where the 
parameter space Q is an interval. We consider only special cases, 
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sometimes called regular cases, of probability density functions as we 
wish to differentiate under an integral (summation) sign. In particular, 
this means that the parameter $ does not appear in endpoints of the 
interval in which f(x; 6) > 0, 

With these assumptions, we have (in the continuous case, but the 
discrete case can be handled in a similar manner) that 


f(x; 6) dx = l 


and, by taking the derivative with respect to 9, 

、卵; 0) 


80 


办 = 0 , 


( 1 ) 


The latter expression can be rewritten as 

df(x; 0) 


30 


fix ； e) 


f(x; 0) dx ^0 


or ， equivalently. 


dlnf(x ； e) 

^ dd ~ 


f(x; 6)dx = 0, 


If we differentiate again, it follows that 

_ 5 2 ln/(x; 0) ^ + d\nf(x; 6) df(x; 8) 


dd 2 


d0 


de 


dx^O. ( 2 ) 


We rewrite the se^nd term of the left-hand member of this ea nation 
as 

Sf(x ； 0) • 

d In f(x; 9) 


se 


dd f(x; 6) 


f(x; 6) dx 


f*0O 

7 In fix; $y 

-QCi 

_ d$ _ 


f(x; 0) dx. 


This is called / deaotejliys(^^)That is ， 


聊 




mco 

~d infix； ey 


de 


f(x; 0) dx; 


but，from Equation (2), we see that 1(6) can be computed from 

d 2 \nf(x; 9) 


m 




de 2 


f(x\ 9) dx. 
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Sometimes, one expression is easier to compute than the other, but 
often we prefer the second expression. 

Remark. Note that the information is the weighted mean of either 


~d Iny(x; 0) 

~~ m ^ 


or 


g 2 hij{x; 0) 

~ W 2 ^ 


where the weights are given by the p.dX 凡 x; 0) - That is, the greater these 
derivatives on the average，the more information that we get about 0. Clearly, 
if they were equal to zero [so that 9 would not be^in In J{x; 9)], there would 

b^groJn £ojiria 虫 ) n about^. As we study more and more statistics, we learn 
to recognize that the function 

d \nj{x; 9) 

^ dd ~ 

* 

is a very important one* For example, it played a major role in finding the m.l.e. 
S by solving 

兮 占 In 只々; <9) 一 

h ~ oe ~ = 


for 6. 


Example L Let X be 7V(0 ， cr 2 )，where — oo < 0 < oo and a 2 is known. Then 

(x - ey~ 




72^ 


exp 


2a 


2 


— OQ <X< 00, 


where —co <0<oo, and 


IMx; 0)= —I !n 


Thus 


d In J{x; 6) x—6 


00 


a d 


and 


d 2 In / {x; 6) 
^ d$ 2 


(T 


Clearly, £[(Z—0) 2 /o 4 ]= —罚一 = l/o 2 - That is，in this case, it does not 
matter much which way we compute I(d) y as 


/(0) = £. 


e\ 


1 or 

re 2 in j{x ； d)i 

1 

ee 

•HI 

J 1 

L ee 2 _ 
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because each is very easy. Of course, the information is greater with smaller 
values of a 2 . 


Example 2. Let X be binomial 8\ Thus 

In Ax ； 的 =jrln 0 + (1 — jc) In (1 — 0), 


and 


Clearly, 


8 lnf(x; 6) X I - x 

一 ， 

dd 

—Q 1-0 ， 

d 2 In f(x; 8) x 1 - 

se 2 

_ 伊 （1 - 

/(<?)=—£ 

,IV - v 

~-X 1— 


L e 2 (i -ey 

0 , 

]-e i 

伊 + 

(1 - 0f 一 0 丁 1 


d 0(1-0) 


which is larger for 6 values close to zero or L 


Suppose that X U X 2 ^ …， 尤 is a random sample from a 
distribution having p,d.f*/(x; 0), Thus the likelihood function (the joint 
P-dX of% ，； T 2 ， ... ， jr fl )is 


增 =/( 4 ; &)f(x 2 ； Q) - • 01 


Of course， 


In L(0) = In f(x t ; 0) + lnf(x 2 ; 0) + … • + In f(x n ; 0) 


and 


Sin L(0) d In f(x l ; 0) d In f(x 2 ； 9} 


d6 


36 


dO 


+ 


dlnf{x n ;0) 

d6 • 


It seems reasonable to define the Fisher 
samglejis 

d In L(0r 2 


W) = E 


d$ 




-j 


Note if we square Equation (3), we obtain cross-product^ terms like 

~d InfjX,; 9)d\n /(^; 9) 


2E 


de 


d$ 


i^h 
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which from the independence of and equals 


dlnf(X i； 0) 
86 


E 


d in f(X J； 6) 
dd 


(X 


m 


The fact that this product equals zero follows immediately from 
Equation (1). Hence we have the result that 

A 0) • 

"* IL ^ 

However, each term of this summation equals 1(6), and hence 

W) = nl{0). 

That is，the Fisher information in a random sample of size nisn times 
the Fisher information in one observation. So, in the two example^oF 
this section, the Fisher information in a random sample of size n is nju 1 
in Example 1 and n/[8(\ — 6)] in Example 2. 

We can now prove a very important inequality involving the 
variance of an estimator, say Y — u(X {j • " ， X n \ of 0, which can 
be biased* Suppose that 

E(Y)^ E[u{X u X 2 ^.^X n )} = k{ey 

That is, in the continuous case ， 

广％ 

u(x u . •, ? 為 )/^; 沒 ） ■ • • f(Xn; 8) dx } … dx„; 


k(e) 




…，昊） 


00 


E 




/(Xf ； 6) d6 


x /(a; 0) •. ^f(x n ; 0) dx' … dx n 


f*00 


— 00 


- 






0) 

v de ~ 


00 


X/(Xi; 0) . • 0)dx 、 … dx„ 


(4) 


n 


Define the random variable Z by Z = ^ In f(X t ; 0)/d0]. In accord¬ 


n 


ance with Equation (1) we have E{Z) — ^ E[d In /«; 0)jd6\ = 0, 

i 

Moreover, Z is the sum of n independent random variables each with 
mean zero and consequently with variance E{[d In f(X; Q)jdOf). Hence 
the variance of Z is the sum of the n variances, 

r/d\nf(x ； e) ' 


<j| = nE 


de 




W) = n 峨， 
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Because Y = u(X \ ，… ，尤） and Z — [5 In 6)/66], Equation (4) 

shows that E(YZ) = k\9). Recall that 

E(YZ) = E(Y)E(Z) + p(Ty(7 Z f 

where p is the correlation coefficient of and Z. Since E(Y) — k(0) and 
E{Z) = 0, we have 


k f {0) = k{Q) * 0 + pcF r (T z or p 


k\6) 


Now p 2 < l. Hence 


剛] 2 〆 
< 


or 


剛]: 


If we replace a\ by its value, we have 

wm 


4 


< (T 2 r ， 


<J 2 r> 


nE 


f d ln/(X; 0) 

^ de ~' 


2-1 


剛] 2 

nl(0) 


This inequality is known as the Rao-Cramer inequality. 

If Y — u(Xi,X 2 ,, " ， X n ) is an unbiased estimator of 0, so that 
k(ff) = 6, then the Rao-Cramer inequality becomes, since k\B) = I, 

Note that in Examples 1 and 2 of this section ] jnl{6) equals a^n and 
0(1 — 8)/n f respectively. In each case, the unbiased estimator, X, of 6, 
which is based upon the sufficient statistic for 0, has a variance that is 
equal to this Rao-Cramer lower bound of 

We now make the following definitions. 

Definition 1. Let Y be an unbiased estimator of a parameter 8 in 
such a case of point estimation. The statistic Y is called an efficient 
estimator of 6 if and only if the variance of Y attains the Rao-Cramer 
lower bound. 

Definition 2. In cases in which we can differentiate with respect to 
a parameter under an integral or summation symbol, the ratio of the 
Rao-Cramer lower bound to the actual variance of any unbiased 
estimation of a parameter is called the efficiency of that statistic. 

Example 3. Let X u X 2 ,... ,X n denote a random sample from a Poisson 
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Accordingly, 


E 


8 2 In 

~IP~ 一 


0 

¥ 


distribution that has the mean 0 > 0. It is known that X is an m 丄 e. of 8; 
we shall show that it is also an efficient estimator of 0. We have 


dlnf(x;d) d 


d& 




d0 

x 

0~ 


(x In 0 — 0 — In x\) 


1 


x 


e 


e 


Accordingly, 


E 


g infix ； ey 


e(x - ef a 2 e i 
~7 e 2 = J 2 = J 2 = e 


The Rao-Cramer Iowe£ bound in this case is l/[n(l/0)] — 9jn. But Bjti is the 

variance of X. Hence X is an efficient estimator of 0. 

_ # * 4 • / A # * 

Example 4, Let S 1 denote the variance of a random sample of size n > I 

from a distribution that is N(fi y 9\0 < 6 < oo, where fi is known. We know 
that E\n^j{n — I)] - 9. What is the efficiency of the estimator nS 2 /^ — l)?We 
have 


In f(x; 6) ^ 


(x — fi) 2 In (2n6) 
20 ~ 2 ~ 


d In /(x; 6) (x- fi) 


2 


de 


20 2 26 


and 


d 2 lnf(x; 9) (x — /i) 2 


ee 2 


e 2 


le 1 


Thus the Rao-Cramer lower bound is 26 2 jn. Now nS 2 ^ is x 2 (/i — 1 >， so the 
variance of nS^/O is 2{n — I)* Accordingly，the variance of n 穿 — 1) is 
— l)[0 2 /(/i “ i) 2 ] = 2 伊 /(/? — J). Thus the efficiency of the estimator 
n 穿 j{n — I) is (n — \)jn. With fi known, what is the efficient estimator of the 
variance? 

Example 5. Let f X 2i …，毛 denote a random sample of size n> 2 from 
a distribution with p*df. 

f(x; $) = 6x°~^ = exp (8 In x — In x + In 9), 0 < x <1, 

= 0 . elsewhere, .. _ _ 

.^ j -i > ' 

It is easy to verify that the Rao-Cramer lower bound is 0 2 /n. Let 


n 


2 

0 

2 


n 


2 

0 
_ 2 

L 
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Yi = —In X h We shall indicate that each Y t has a gamma distribution. The 
associated transform ^ = —In x h with inverse x ( — is one-to-one and 
the transformation maps the space {Xi ： 0 < x t < 1} onto the space 
{yi :0 <y f < oo}. We have |/j = e~ yi . Thus Y ( has a gamma distribution with 

n 

a — I and p = 1/0. LetZ = In Then Z has a gamma distribution with 

» 1 

and ^ = 1/6. Accordingly, we have E(Z) = ocj? — njB. This suggests that 
we compute the expectation of 1 /Z to see if we can find an unbiased estimator 
of 0. A simple integration shows that £(1/Z) = 0/(n — 1). Hence (n — ! )/Z is 
an unbiased estimator of B. With n> 2 y the variance of (n — I)/Z exists and 
is found to be ^/(n — 2), so that the efficiency of (n — 1)/Z is (n — 2)jn. This 

efficiency tends to 1 as n increases. In such an instance, the estimator is said 

■* ^ ■, ► 

to be asymptotically efficient. 

. * ， '' ; v ■ _ * • ' ， 

The concept of j oint efficient estimators of several parameters has 

been developed along with the associated concept of joint efficiency of 
several estimators. But limitations of space prevent their inclusion in 
this book* 


EXERCISES 

8*11* Prove that X, the mean of a random sample of size n from a 
distribution that is N(Q, a 2 ), 一 oo < 0 < oo, is，for every known tr 2 > 0, an 
efficient estimator of 

Bi 

S 

8.12. Show that the mean X of a random sample of size/i from a distribution 
which is b(l, &)， 0 <8 < 1, is an efficient estimator of 0. 

8.13. Given f(x; $) — I /9 3 0 < x <$^ zero elsewhere, with 6 > 0, formally 
compute the reciprocal of 


nE 


[d In f(X;0) 
^ 86 ^ 


2 


Compare this with the variance of (« + l)Y„/n 7 where Y n is the largest item 
of a random sample of size n from this distribution. Comment, 

8*14* Given the pAJ. 


fix ； e) 


7l[\+(X-0f\ y 


— oo<x<oo 9 一 OO < 0 < 00 


Show that the Rao--Cramer lower bound is 2/rt, where n is the size of a 
random sample from this Cauchy distribution. 

8J5* Let X have a gamma distribution with a w 4 and ^ — 8 >0. 

(a) Find the Fisher information7(0), 二 
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(b) UX u X 2f ••” 足 is a random sample from this distribution, show that 
the m,Le, of 6 is aa efficient estimator of 8. 

8.16_ Let X be JV(0, 0 )，0 <6 < oo. 

(a) Find the Fisher information I(8y 

(b) If ^ ^ X„ is a random sample from this distribution, show that 

the m,Le, of 6 is an efficient estimator of 0. 


8*3 Limiting Distributions of Maximum Likelihood Estimators 

We usfc the notation and assumptions of Section 8,2 as much as 
possible here ； In particular, f(x; 0) is the pxLf” J(0) is the Fisher 
information, and the likelihood function is 

Also, we can differentiate under the integral (summation) sign，so that 


Z 


din L($) 


今 d InfjX,; 0) 

r= I 90 


has mean zero and variance nl(6). In addition，we want to be able to 
find the maximum likelihood estimator § by solving 


d[\n L(6)) 


o. 


That is ， 


a[ln L0)] 


d ❹ 


0, 


where now, with § in this expression ， L0) = f(X t ; … f(X n ; §). 
We can approximate the left-hand member of this latter equation by 
a linear function found from the first two terms of a Taylor’s series 
expanded about $ y namely 


dl\nL(9)] 
~ d6 ^ 




when L(e) -/(^; 6)f(X 2 ; 9) - - U). 

Obviously, this approximation is good enough only if § is close to 
0, and an adequate mathematical proof involves certain regularity 
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conditions, all of which we have not given here. But a heuristic 
argument can be made by solving for 0 — 0 to obtain 


d[ln L(9)] 

dd _ 

d 2 [ln L(6)] = d 2 [ln 靖 ]. 


Z 


se 1 


ee : 


Let us rewrite this equation as 

0-0 


Z/^nK$) 


nI{Q) 


n 


d 2 [ln 聊 

^ ee 2 


聊 


Since Z is the sum of the i 丄 d, random variables 


dlnf(X i ； 0) 

ee 


，2, 




(I) 


each with mean zero and variance 1(0), the numerator of the right-hand 
member of Equation (1) is limiting N(0, 1) by the central limit theorem. 
Moreover, the mean 

1 A -d 2 lnf(X i； 0) 


converges in probability to its expected value, namely I(6) t So the 
denominator of the right-hand member of Equation (1) converges in 
probability 1. Thus, by Slutsky’s theorem given in Section 5,5, the 
right-hand member of Equation (1) is limiting iV(0, 1), Hence the 
left-hand member also has this limiting standard normal distribution. 
That means that we can say that § has an approximate normal 
distribution with mean $ and variance l/nl(d). 

The preceding result means that in a regular case of estimation and 
in some limiting sense, the m J.e •沒 is unbiased and its variance achieves 
the Rao-Cramer lower bound That is，the m.le. 6 is asymptotically 
efficient. 

Example h In Exercise 8,14 we examined the Rao-Cramer lower bound 
of the variance of an unbiased estimator of 9, the median of a certain Cauchy 
distribution. We now know that the mJ,e, SofO has an approximate normal 
distribution with mean ^ and variance equal to the lower bound of 2/n. Hence, 
once we compute ^ we can say, for illustration, that § + I provides 

an approximate 95 percent confidence interval for 0. 
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To determine 沒 ， there are many numerical methods that can be 
used. In the Cauchy case，one of the easiest is given by the following: 

dlnLjd) « 2( Xi - 9) 

一 de —厶 r+(;w 

HI 9 * 

In the denominator of the right-hand member, we use a preliminary 
estimate of 9 that is not influenced too much by extreme observations. 
For illustration, the sample median，say ^ 0 , is very good one while the 
sample mean x would be a poor choice. This provides weights 



2 

1 + ( x ，_ 成 )) 2 ’ 


so that we can solve 



0= £ — 0) to get 




1 n 

Now can be used to obtain new weights and § 2 \ 


^■2 


2 




E 


This iterative process can be continued until adequate convergence is 
obtained; that is, at some step k, § k will be close enough to ^ to be used 
as the m 丄 e. 


Example 2. Suppose that the random sample arises from a distribution 
with pAI. 

f(x; 0) = 0 < x < l 5 0 e n = {0 : 0 < 0 < oo }， 

""番 ' m fc •. ■•零 

zero elsewhere. We have 


In f(x;0) — ln$ + (0 — I) In x. 


d lnf(x; 6) I 


de 


--hlnx. 


and , 

g 2 lnf(x; 9) 一 i … 

， ~ de 2 t ^ 一 F 2 

一 ** ■* ■ **■ 

Since E(— XjO 1 ) ^ —\/0 2 , the lower bound of the variance of every unbiased 
estimator of 0 is 6 2 /n. Moreover, the maximum likelihood estimator 
— «/ln ( X { has an approximate normal distribution with mean 0 and 
variance 8 2 jn. Thus, in a limiting sense, S is the unbiased minimum variance 
estimator of 0; that is, 0 is asymptotically efficient. ^ 
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Example 3. The raJ.e, for 0 in ^ 

f(x; 0 )=—— ， 又 = 0, 1, 2 , ，•. ， 0 e 11 = {0: 0 < 0 < oo }， 

■ 

is § — X, the mean of a random sample. Now 


In f(x; 6) = xlnd — 0 — In xl 

4 b _ ^ • 、-， - ，二 ► 

and 

d[ln f(x; &)] x d 2 \lnf(x; 9)] x 

- rr^~ = -： — I and - —-~ =——, 

d$ 0 d& 2 e 2 

Thus 

\ e 2 J e 2 e , 

and S — X has an approximate normal distribution with mean B and standard 
deviation yjQjn, That is ，Y = (X — 0)/^/8/n has a limiting standard normal 
distribution. The problem in practice is how best to estimate the standard 
deviation in the denominator of Y. Clearly, we might use X for Q there，but 
does that create too much dependence between the numerator and 
denominator? If so, this requires a very large sample size for (X — Oy^/x/n 
to have an approximate normal distribution. It might be better to approximate 
1(0) by 


Thus nf(9) is approximated by ns 2 /^ and we can say that 

y/n(X - 9) 

■ ~Yjs 


is approximately JV(0,1), We do not know exactly which of these two 
solutions, or others like simply using sj^fn in the denominator, is best. 
Fortunately, however，if the Poisson model is correct, usually 



If this is not true, we should check the Poisson assumption, which requires, 
among other things, that fx = a 1 . Hence, for illustration, either 


x 土 1.96 



or x 土 


L96x 


or x + 


l Ms 


serves as an approximate 95 percent confidence interval for 0. In situations 
like this，we recommend that a person try all three because they should be in 
substantial agreement* If not, check the Poisson assumption • 、 
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The fact that the mJ.e. 6 has an approximate normal distribution 
with mean 6 and variance \jnl(6) suggests that S (really a sequence 
H 色 ，••” 見 ，…） converges in probability to 6. Of course, can 
be biased; say E0 n — Q) — b n {6\ where b n (8) is the bias* However, b n (0) 
equals zero in the limit. Moreover, if we assume that the variances exist 
and 

lim [var (0 n )] = lim 


then the limit of the variances is obviously zero- Hence, from 
Chebyshev’s inequality, we have 

Pr El^, - ^| > £] < — 7 — . 

r 

i 

However, 

lim E[(9 n — 6) 2 ] = lim [b 2 n {0) + var ( 色 ) ]= 0 

tl 00 Jf—*00 

and thus 

lim Pr [ 此 -(9| > £] - 0 

for each fixed e > 0* Any estimator，not just maximum likelihood 
estimators, that enjoys this property is said to be a consistent estimator 
of 9. As illustrations，we note that all the unbiased estimators based 
upon the complete sufficient statistics in Chapter 7 and all the 
estimators in Sections 8,1 and 8,2 are consistent ones. 

We close this section by considering the extension of these limiting 
distributions to maximum likelihood estimators of two or more 
parameters. For convenience, we restrict ourselves to the regular case 
involving two parameters, but the extension to more than two is 

obvious once the reader understands multivariate normal distributions 
(Section 4.10). 

Suppose that the random sample X ， X 2 , •，” arises from a 
distribution with p.d.f, f(x; 6 Xy 0 2 \ ( 久， 0 2 )e in which regularity 
conditions exist. Without describing these conditions in any detail, let 
us simply say that the space of X where f(x; 0 2 ) > 0 does not 


nl{0) 
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involve and 0 2 , and we are able to differentiate under the integral 
(summation) signs. The information matrix of the sample is equal to 

n x 




E 


d\nf(X;d h e 2 ) 

dd } 


E 


r d\nf(x ； e h e 2 )dinf(x ； e h d 2 ) 


d6\ 


dd 2 


E 


d\nf(x ； e h e 2 )d\nf(x ； e u d 2 ) 


dd } 


dd 2 


d\nf(X ； e u 0 2 ) 

¥ 2 


2- 


V. 


J 


n 


r 

E 

~d 2 \nme^e 2 )~ 

E 

~d 2 Infix ； 0 u d 2 y 



L 蜊 」 


d$] d& 2 


E 

e u e 2 y 

F 

rd 2 \nf(x ； e,j 2 )i 



d9 { d0 2 

； i 

-蜊 」 

j 


One can immediately see the similarity of this to the one-parameter 
case, ' 

一 If and § 2 are maximum likelihood estimators of and 0 2 ^ then 
0\ and have an approximate bivariate normal distribution with 
means and 6 2 and variance-covariance matrix I" 1 , That is, the 
approximate variances and covariances are found, respectively, in the 
matrix 


(var (^) cov (0,, § 2 )\ 
\cov0u§i) var0 2 ) J’ 


An illustration will help us understand this result that has simply been 
given to the reader to accept without any mathematical derivation. 

Example 4. Let the random sample X u X 2 . X h arise from N(0 ly 0 2 ). 

Then 


in/(jc;^i, e 2 ) = - jin (2nd 2 ) - (:』)- 
dlnf(x ； e [f 9 2 ) x-9 } 


d\nf(x\d u e 2 ) 

de. 


0 2 1 

I (x^0r) 2 

2 ^ 2d] 
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d 2 lnf(x; o h e 2 ) — 1 




02 


d 2 inAx ； e u e 2 ) -(x-e,) 


de x ee 2 
infix ； e t ,e 2 ) 


圬 ， 

1 (x — 0|) 


d&i 201 $1 • 

If we take the expected value of these three sa^>nd partial derivatives and 
multiply by — n，we obtain the information matrix of the sample, namely. 


I, 


i 0 

o n 




2 圬 


Hence the approximate variance-covariance matrix of the maximum 
likelihood estimators & \ = X and 6 2 — > s 




r 02 




n 


0 


0 苎 


n 






It is not surprising that the covariance equals zero as we know that X and 
S 2 are independent. In addition, we know that 

沒 2 


^ar (X) 


n 


and 


var (S^) = var 



圬 fnS 1 

n^ r {T 2 


7{n - \)B\ 


n A 


since nS^/Bj is 义 2 (« — 1). While var (S 2 ) ^ 20^/n, it is true that 


201 2(n ^ 1)01 

— ^ --- 

n n 1 


for large n. 


EXERCISES 

學 ▲ W J • ^ . M * 

8.17. Let X u X 2f * •, ，毛 be a random sample from each of the following 
distributions. In each case，find the m.l.e. var ( 句， l/nl(8), where 1(6) is 
the Fisher information of a single observation X 9 and compare var (&) and 

l/nl(ey 
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(b) N(0 y I) f -oo <6»<oo. ' • 

(c) N(0, 6), 0 < 0 < oo. 二 * 

(d) Gamma (a ^ 5,^ ^ 6) y 0 < 0 < oo. 

8.18* Referring to Exercise 8.17 and using the fact that ^ has an approximate 

iV[0 ， l/n/(0)]j in each case construct an approximate 95 percent confidence 
interval for 0, 

B 

8.19. Let d K)，d r 2 ) ，…， d D be a random sample from a 
bivariate normal distribution with unknown means 6 } and 0 2 and with 
known variances and correlation coefficient, ^ and p y respectively. Find 
the maximum likelihood estimators ^ and 式 of 0】 and and their approxi¬ 
mate variance-covariance matrix. In this case，does the latter provide the 
exact variances and covariance? 

8.20, Let (X lt F|), (X 2> K 2 ),.., ^ (X n9 Y n ) be a random sample from a 
bivariate normal distribution with means equal to zero and variances 9 [ and 
❹ 2 , respectively 5 and known correlation coefficient p. Find the maximum 
likelihood estimators § x and § 2 of and d 2 and their approximate 
variance-covariance matrix. 


8,4 Robust M-Estimation 

I 

In Example 1 of Section 8.3 we found the mJ,e, of the center B of 
the Cauchy distribution with p.dX 


fiK 




00 < JC < 00 


n[\ +(x— 6 ) 2 ] 5 

* . 

where - oo < 0 < oo _ The logarithm of the likelihood function of a 
random sample X {y X 2lf …，尤 from this distribution is 

In ^ In 7i - In [1 + (x, - Of], 


To maximize，we differentiated In L(0) to obtain 


din L(Q) 
~ d$ ~ 


n 

I 


2 ( — 0 ) 
(Xi — Qf 


0, 


The solution of this equation cannot be found in closed form, but the 

equation can be solved by some iterative process. There，to do this, we 
used the weight function . 


w(x — 似 = 


2 

1 + (x — 沒 0 ) 2 ’ 
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where is some prdiminaiy estimator of 0 > like the sample median. 
Note that values of x for wWch \x — 0 o J.is relatively large do not have 
much weight. That is, in finding the maximum likelihood estimator of 
0， the outlying values are downweighted greatly. 

The generalization of this special case is described as follows. Let 
尤， A ， …，毛 be a random sample from a distribution with a p.df 
of the form f(x — 6\ where 0 is a location parameter such that 
— oo < 0 < oo. Thus 

鳄 . 

\nL(0)^ "£ \nf(x t - 0) = - £ p{x t - 9), 

i = 1 i=( 


where p(x) 


=—hi /(jc) ，and 
d In L{6) ^ — ^ f(Xj - 0) 

dO 


t ^ - 0), 

i= 1 


where p\x) = ¥(jc). For the Cauchy distribution, we have that these 
functions are 

p(x) = In 7T + In (1 + x 1 % 

and 


屮 (x) 


+， 


In addition，we define a weight function as 


w(x ) — 


平⑻ 

X 




which equals 2/(1 + x 2 ) in the Cauchy case* 

To appreciate how outlying observations are handled in estimating 
a center $ of different models progressing from a fairly light-tailed 
distribution like the normal to a very heavy-tailed distribution like the 
Cauchy, it is an easy exercise (Exercise 8.21) to show that standard 
normal distribution, with <p(x\ has 

pOc) — ^ In 2 tt + y, V(jc) — x, w(x) — h 

. ■ + . * ’ * * » 

That is，in estimating the center 0 in <p{x — ff) each value of x has the 
weight 1 to yield the estimator § = X. 
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Also, the double exponential distribution, with p.d.f. 

I 

f(x) = 2 — oo < ;c < 00 ， 

has, provided that x ^=0, 

p(x) = In 2 + |x |， 屮 (X) = sign ⑷， w(x) = =☆. 

Here ^ = median (D because in solving 

Z 乎 ( 不 -0) ^ X sign (x t - — 9) = 0 

i = 1 / — 1 

we need as many positive values of x f — 0 as negative values. The 
weights in the double exponential case are of the order l/\x — 0| s while 
those in the Cauchy case are 2 / [1 + (x ~ $) 2 ]. That is, in estimating the 
center，outliers are downweighted more severely in a Cauchy situation, 
as the tails of the distribution are heavier than those of the double 
exponential distribution. On the other hand, extreme values from the 
double exponential distribution are downweighted more than those 
under normal assumptions in arriving at an estimate of the center $. 

Thus we suspect that the mJ.e. associated with one of these three 
distributions would not necessarily be a good estimator in another 
situation. This is true; for example, X is a very poor estimator of the 
median of a Cauchy distribution，as the variance of X does not even 
exist if the sample arises from a Cauchy distribution. Intuitively, X is 
not a good estimator with the Cauchy distribution, because the very 
small or very large vajues (outliers) that can arise from that distribution 
influence the mean X of the sample too much. 

An estimator that is fairly good (small variance, say) for a wide 
variety of distributions (not necessarily the best for any one of them) 
is called a robust estimator. Also estimators associated with the 
solution of the equation 

I ^ t 

are frequently called robust M-estimators (denoted by because they 
can be thought of as maximum likelihood estimators. So in finding a 
robust ^/-estimator we must select a 'P function which will provide an 
estimator that is good for each distribution in the collection under 
consideration. For certain theoretical reasons that we cannot explain 
at this level ? Huber suggested a 'F function that is a combination of 
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those associated with the normal and double exponential distributions, 

^¥(x) = —k, x < —k 


X ， 

K 


—k < x <k 7 
k < x 、 


with weight w(x) - I, \x\ < and kj\x\^ provided that k < |x|. In 
Exercise 8.23 the reader is asked to find the p.d.f* f(x) so that the 
Af-estimator associated with this T function is the mJ.e. of the 

H 

location parameter 6 in the p.dX f(x — 6). 

With Huber’s 屮 function, another problem arises. Note thatjf we 
double (for illustration) each X x ? X 2 , •. •»estimators such as X and 
median (J5Q also double. This is not at all true with the solution of the 
equation 


n 


I ❺= 0 ， 

i^= I 

where the 乎 function is that of Huber. One way to avoid this difficulty 
is to solve another，but similar, equation instead ， 

-0， 




A 


d 


0, 


( 1 ) 


where J is a robust estimate of the scale* A popular d to use is 

median \x f — median ⑻ | 


d 


0.6745 


The divisor 0.6745 is inserted in the definition of d because then d is 
a consistent estimate of cr and thus is about equal to a, if the sample 
arises from a normal distribution. That is, <j can be approximated by 
d under normal assumptions. 

That scheme of selecting d also provides us with a clue for select¬ 
ing k. For if the sample actually arises from a normal distribution，we 
would want most of the values 々 ， ，•， ， to satisfy the inequality 


Xi — Q 

~d~ 


< k 


because then 


X/ 一 0 
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That is，for illustration, if all the values satisfy this inequality, then 
Equation (1) becomes 



This has the solution which of course is most desirable with normal 
distributions. Since d approximates popular values of k to use are 
L5 and 2.0, because with those selections most normal variables would 
satisfy the desired inequality. 

Again an iterative process must usually be used to solve Equation 
(1). One such scheme ， Newton’s method, is described. Let be a first 
estimate of 0, such as = median (x，). Approximate the left-hand 
member of Equation (1) by the first two terms of Taylor’s expansion 
about 0 O to obtain 

齊 ㈣ (夺 °， 

p f« s 

approximately. The solution of this provides a second estimate of 6, 



which is called the one-step Af-estimate of 0. If we use in place of 
沒 0 ， we obtain 沒 2 ，the two-step Af-estimate of 0. This process can 
continue to obtain any desired degree of accuracy. With Huber’s 屮 
function，the denominator of the second term, 



is particularly easy to compute because = 1, ~k <x<k, and 
zero elsewhere* Thus that denominator simply counts the number of 
， jc 2 , -, • ， such that \x t - — ^ 0 \/d < k. 

Say that the scale parameter <r is known (here a is not necessarily 
the standard deviation for it does not exist for a distribution like the 


Cauchy). Two terms of Taylor’s expansion of 


n 
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about 0 provides the approximation 



a 


This can be rewritten 




乂一 0 


cr 


e 


Wl 




( 2 ) 


a 


For the asymmetric 屮 functions that we have considered 

f x~e 


E 




(X 


provided that X has a symmetric distribution about 6, Clearly, 


var 


X~6) 


a 


E 



x~e 


a 


Thus Equation (2) can be rewritten as 

s /^0 ~ 



E 


:作 )]} 



(3) 


Clearly, by the central limit theorem, the numerator of the right- 
hand member of Equation (3) has a limiting standardized normal 

distribution，while the denominator converges in probability to 1, Thus 

# 

the left-hand member has a limiting distribution that is N(0, !)- In 

1 辛 _ . m 

application we must approximate the denominator of the left-hand 
member. So we say that the robust Af-estimator § has an approximate 
normal distribution with mean 9 and variance 







v 


n 


n : 




Xf - 谷 it 
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where 6 k is the (last) A>step estimator of 0. Of course, § is approximated 
by 6 k ; and an approximate 95 percent confidence interval for B is 
given by 9 k - ] 36^/v to § k + 1.96^/^. 

EXERCISES 

8,21. Verify that the functions p{x), ^(x), and w(x) given in the text for the 
normal and double exponential distributions are correct. 

a 

8*22* Compute the one-step Af-estimate ^ using Huber’s 乎 with k — L5 if 
n = l and the seven observations are 2,1 ， 5,2, 2.3,1 ‘4, 2.2,2.3, and 1,6. Here 
take = 2.2, the median of the sample. Compare ^ with x, 

8.23, Let thep.dX f(x) be such that the ^/-estimator associated with Huber’s 
中 function is a maximum likelihood estimator of the location parameter 
inf(x — By Show that/(x) is of the form where p ( (x) = ^/2, \x\ < k 

and pi(x) = k\x\ — k 2 /2, k < \x\. 

8.24* Plot the ¥ functions associated with the normal, double exponential, 
and Cauchy distributions in addition to that of Huber* Why is the 
A/*estimator associated with the 'F function of the Cauchy distribution 
called a redescending A/-estimator? 

8*25 ， Use the data in Exercise 8*22 to find the one-step redescending M- 
estimator 6 { associated with ^F(x) = sin (x/L5), \x\ < L5n y zero elsewhere. 
This was first proposed by D. F. Andrews, Compare this to 3c and the 
one-step Af-estimator of Exercise 8.22. [It should be noted that there is no 
p-d.f, /(x) that could be associated with this ^(x) because ^¥(x) — 0 if 
\x\ > 1 .5n.] 

ADDITIONAL EXERCISES 


8.26. Let X u X 2r •“， be a random sample from a gamma distribution with 
a = 2 and ^ — 1/0, 0 <8 < oo. 

(a) Find the m 丄 e. ， 艮 of 0. Is 沒 unbiased? 

(b) What is the approximating distribution of 

(c) If the prior distribution of the parameter is exponential with mean 2, 
determine the Bayes’ estimator associated with a square-error loss 
function. 


8_27. If H …，尤 is a random sample from a distribution with p.d-f. 
/(^； &) = 3 伊 (x 十 0 ) _4 ,0 < X < oo 9 zero elsewhere, where 0 < 0 ， show that 
Y = 2X is an unbiased estimator of 6 and determine its efficiency. 


8.28. Let JT 2 , 


6 ) 


e 


(I +xf^ 1 


X n be a random sample from a distribution with p,d,f. 

■ 4 * ♦ 

， 0 < x < oo, zero elsewhere, where 0 < 0, 
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(a) Find the 泛 ， of 0 and argue that it is a complete sufficient 

statistic for 8. Is 0 unbiased? 

(b) If 6 is adjusted so that it is an unbiased estimator of 0 9 what is a lower 
bound for the variance of this unbiased estimator? 

8.29. If X x , X 2j . * -, is a random sample from N(fi ， I )， find a lower bound 
for the variance of an estimator of k(6) — Q 1 . Determine an unbiased 
minimum variance estimator of 6 2 and then compute its efficiency. 

8.30. Suppose that we want to estimate the middle, 0, of a symmetric 
distribution using a robust estimator because we believe that the tails of this 
distribution are much thicker than those of a normal distribution. A 
f-distribution with 3 degrees of freedom with center at B (not at zero) is 
such a distribution, so we decide to use the mi.e. 5 ^ associated with that 
distribution as our robust estimator. Evaluate § for the five observations: 
10.1,20.7,113,12.5,6.0. Here we assume that the spread parameter is equal 
to l, 

8.31* Consider the normal distribution A^(0, 6). With a random sample 
Xi , JG, *. * s X„ we want to estimate the standard deviation Find the 

constant c so that Y = c ^ \Xj\ is an unbiased estimator of y/0 and 
determine its efficiency. 1 



CHAPTER 9 


Theory of 

Statistical Tests 


9*1 Certain Best Tests 

In Chapter 6 we introduced many concepts associated with tests of 
statistical hypotheses. In this chapter we consider some methods of 
constructing good statistical tests, beginning with testing a simple 
hypothesis H 0 against a simple alternative hypothesis H x . Thus, in all 
instances, the parameter space is a set that consists of exactly two 
points. Under this restriction, we shall do three things: 

1 • Define a best test for testing H 0 against 

2, Prove a theorem that provides a method of determining a best test, 
3_ Give two examples. 

Before we define a best test, one important observation should 
be made. Certainly, a test specifies a critical region; but it can also be 
said that a choice of a critical region defines a test. For instance, if 
one is given the critical region C — {(x,, jc 2 , x 3 ) : jc^ + ^ ^ > I}, the 
test is determined; Three random variables X u are to be 

considered; if the observed values are x u x 2 , accept H Q if 
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/(^； 0 
f(x ； I) 

/(i;!)//(;!) 


+ ^ + ^ < 1 ； otherwise, reject H 0 , That is, the terms “test” and 
"critical region” can，in this sense, be used interchangeably. Thus, if 
we define a best critical region，we have defined a best test. 

Let f(x; 9) denote the p+d.f* of a random variable X. Let 
不， X 2 ” … ， A denote a random sample from this distribution, and 
consider the two simple hypotheses = and = Thus 

H = (0 ： 0 == Q\ We now define a best critical region (and hence a 
be 孕 t test) for testing the simple hypothesis H Q against the alternative 
simple hypothesis H { , In this definition the symbols 
作 [( 不， X 2 ^,^X n )eC;H Q ] and Pr^JT *, 為 ，…， 尤 ) e C; 丑 mean 
Pr [(X ,, 石， … ，足 ） e C] when, respectively, H 0 and are true. 

Definition L Let C denote a subset of the sample space. Then C is 

called a size a for testing the simple hypothesis 

H 0 : 8 ^ 6 f against the alternative simple hypothesis H]: 0 — 0 ,f 
if，for every subset A of the sample space for which 
Pr[(X ，， .. • ，尤 )e^/ 4 ] = oc: 

蜃 

(a) Pr[(X u X 2 ^ .^X n )eC;H 0 ]^^ 

(b) Pr[(X t , X n ) g C; H { ] > Pr [(X U X 2 , …， JQ eJ; 拓 】. 

This definition states，in effect, the following: First assume ff 0 to 
be true. In general, there will be a multiplicity of subsets A of the 
sampk space such that Pr [(X, ， Z 2 ，…， JSQ e 4] = a. Suppose that 
there is one of these subsets, say C, such that when H { is true, the power 
of the test associated with C is at least as great as the power of the test 
associated with each other A Then C is defined as a best critical region 
of size a for testing H 0 against H } . 

In the following example we shall examine this definition in some 
detail and in a very simple case. 

Example L Consider the one random variable X that has a binomial 
distribution with n = 5 and /? = (9. Let f(x; 6) denote the p.dX of X and let 

0 = {and Hi ： 6 = 1 ，The following tabulation gives，at points of positive 
probability density, the values of/( jc; 釤 , /(;!)，and the ratio f(x; |). 

^ 0 1 2 3 4 5 


丄匁 243T243 
5I32405 II 328T 
[01322700243227 

1032900243219 

- ti 

53215113213 

嗜 

丄 32 丄 132 
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We shall use one random value of X to test the simple hypothesis H^ :Q ^ 5 
against the alternative simple hypothesis ^ : 0 = and we shall first 
assign the significance level of the test to be ot — We seek a best 
critical region of size a — If A { — {x : x — 0} and A 2 ^ {x:x — 5}, then 
Vr (Xe A { ; H^) = Pr (Xe A 2 ; H 0 ) = ^ and there is no other subset A 3 of the 
space {n = 0, 1, 2, 3,4, 5] such that Pr (X e A 3 ; H 0 ) — ^ Then either 為 or 
A 2 is the best critical region C of size ot — ^ for testing against H x , We note 
that Pr (XeA t ; H 0 ) — ^ and that Pr {XbA x ;H x ) = ^. Thus, if the set A l is 
used as a critical region of size a = 士 ， we have the intolerable situation that 
the probability of rejecting H 0 when is true (H 0 is false) is much less than 
the probability of rejecting H Q when /f 0 is true. 

On the other hand, if the set A 2 is used as a critical region, then 
Pr (Xe A 2 ; H 0 ) — — and Pr (X 6 A 2 ; H v ) = That is, the probability of 
rejecting H Q when H { is true is much greater than the probability of rejecting 
i / 0 when is true. Certainly, this is a more desirable state of affairs, and ac¬ 
tually A 2 is the best critical region of size a = The latter statement follows 
from the fact that，when JHf 0 is true, there are but two subsets^ A x and A 2> of 
the sample space, each of whose probability measure is ^ and the fact that 


243 

1024 




Pr (XeA 2 ;H l )>Pr {XeA v ;H x ) 


(024 T 


n 

It should be noted，in this problem, that the best critical region C - A 2 of size 
a = 去 is found by including in C the point (or points) at which f(x; ]) is small 
in comparison with f(x; This is seen to be true once it is observed that the 
ratioJlx; |) is a minimum at a: = 5. Accordingly，the ratio f(x; |), 

which is given in the last line of the above tabulation, provides us with a precise 
tool by which to find a best critical region C for certain given values of oc. To 
illustrate this，take oc = When is true, each of the subsets : jc — 0, I}, 
{x:x^ 0,4} s {x:x=l, 5}» {x: x — 4, 5} has probability measure ^ By 
direct computation it is found that the best critical region of this size is 
{x :x ~ 4, 5}. This reflects the fact that the ratio f(x; |)//(x;|) has its 
two smallest values for jc — 4 and x = 5- The power of this test, which has 


a 


J2^ 


IS 


?r(X=4 > 5;H i ) 




405 

丽 


I 243 _ 64^ 
丁 _024 _ 1024' 


The preceding example should make the following theorem, due to 
Neyman and Pearson, easier to understand. It is an important theorem 
because it provides a systematic method of determining a best critical 
region. 

B P * •* m m ■# rr ■' « ■ » • * 

Neyman-Pearson Theorem. Let X 2 ^ ， ， ■ ，义 ， where n is a fixed 
positive integer, denote a random sample from a distribution that has 
p.df./(Jt; 0). Then the joint p.d-f* of X u X 2i ... * X n is 

W\ x ■ ，知 … = /(x ，； 0)f(x 2 ; 6) * -外 
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and 0 〃 be distinct fixed values of 9 so thatQ ^ {0 ：0 ^ Q\ 0 "}, and 

let k be a positive number. Let C be a subset of the sample space such 
that: 


(a) 

(b) 

(C) 


W ft \ X U X 2 , , , . ， 

渾 ; X ly X 2 , …， A) 


1(0"; I!, X 2 , …， X„) 

a = Pr[(H …， JQeC;/y, 


< k, for each point (x, s x 2 ^... 
> k, for each point (x u x 2 ,. 


.,,x n )e C % . 


Then C is a best critical region of size a for testing the simple hypothesis 
// 0 : Q =Q f against the alternative simple hypothesis : 8 - & \ 


Proof. We shall give the proof when the random variables are 
of the continuous type. If C is the only critical region of size 
oc，the theorem is proved. If there is another critical region of 
size a, denote it by A. For convenience, we shall let 
I ^ r ! L(0; , x n ) dx x * — dx n be denoted by L(0). In this 

notation we wish to show that 

c .** 

， 广 

1(01 - I L(9 tf ) > 0. 


Since C is the union of the disjoint sets C n A and C n A* and A is 
the union of the disjoint sets A r\ C and A n C*，we have 




L{6 tf ) - 






W1 






L{Q ft ) + 


n A 


rm 


W1 - 


J C n A* 




寧） 一 


^ n C 




L(r) 




L(r) 


n 




⑴ 


n C 4 


However, by the hypothesis of the theorem ， L(6 ,f ) > {\jk)L{Q , ) at each 
point of C s and hence at each point of C n thus 






L(e tf ) > 


n A* 


k 


準)， 


V 


C n A* 
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But i(^)<(I/A:)L(0O at each point of C*，and hence at each point of 
An C*; accordingly, 




These inequalities imply that 


▼)4 




^AnC* 






L(r) - 

^CnA* ^Ar\C* ^CnA* 

and, from Equation (1), we obtain 






準); 


^Ar\C* 






L(r) 


c 




L{e tf )> 




k 






寧）一 


J CrsA' 


W r ) 




However, 


( 2 ) 




潭 ) 一 


f Cr\A 


* 


W1 


J AnC* 




L(0O + 


f CnA 


m 




z(r)- 


^CnA 




L{e f ) - 


^AnC 


雕 ) 


^Ar\C* 




L{Q r ) 


L{e f ) - 

=a —a —0, 

If this result is substituted in inequality (2)，we obtain the desired 
result. 




/I 


L(01 


J c 


L(e ,f )>o. 


If the random variables are of the discrete type, the proof is the same ， 
with integration replaced by summation. 


Remark. As stated in the theorem, conditions (a) ， (b), and (c) are sufficient 
ones for region C to be a best critical region of size a. However, they are also 
necessary. We discuss this briefly. Suppose there is a region A of size a that 
does not satisfy (a) and (b) and that is as powerful at 0 = 6 /f as C, which satisfies 
(a) ， (b)，and (c)* Then expression (1) would be zero, since the power at 0" using 
3 is equal to that using C. It can be proved that to have expression (1) 
equa] zero A mast be of the same form as C, As a matter of fact, in 
the continuous case, A and C would essentially be the same region; that 
is，they could differ only by a set having probability zero* However, in 
the discrete ease, if Ft [L(dy= kL(6 tf ); H 0 ] is positive, A and C could be 
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different sets, but each would necessarily enjoy conditions (a), (b), and (c) to 
be a best critical region of size a. 


One aspect of the theorem to be emphasized is that if we take C to 
be the set of all points (x u x 2 , …， which satisfy 


潭 ; a ， x 2 , 

i(0"; ， JV 2s 


•，^ ?) 




< K 


k > Q ， 


then, in accordance with the theorem, C will be a best critical region. 
This inequality can frequently be expressed in one of the forms (where 
C[ and c 2 are constants) : 


» -^25 * * ^ 沒，召 ） S C |， 


e\e ff )> c 2 . 

Suppose that it is the first form, u t < c\. Since 0 f and 0 〃 are given 

constants ， w,(Z 】， X 2 , ^ , X n \ 0\ 9 ff ) is a statistic; and if the p.d.f. of this 

statistic can be found when H 0 is true, then the significance level of the 

test of against H { can be determined from this distribution. That 

is, . 


oc = Pr [u } (Xi f X 2f . •., X n ; Q\ 6 ff ) < C \; H 0 ]. 


Moreover，the test may be based on this statistic; for, if the observed 
values of X } , X 2l …，心 are x! ， we reject H 0 (accept ff ] ) if 

岣&1，文2,…•，名卜 


A positive number k determines a best critical region C whose size 
isa — Pr [(X x ，… U e C; 好 0 ] for that particular A:* It may be that 
this value of ot is unsuitable for the purpose at hand; that is, it is too 
large or too small. However, if there is a statistic u x {X x ， JT 2 , … ， X n X 
as i 〒 the preceding paragraph, whose p.dX can be determined when 
//o is true, we need not experiment with various values of k to 
obtain a desirable significance level. For if the distribution of the 
statistic is known，or can be found, we may determine C\ such that 

Prh ( 弋， 4 … ，总 ) < C\ ； H 0 ] is a desirable significance level 
An illustrative example follows, 

► 

Example 2* Let X { , X^ y -.., denote a random sample from the 
distribution that has the p,dX 


f(x; $) — ~= exp 
s/2n 



— 00 < JC < 00. 
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It is desired to test the simple hypothesis : 6 = O' = 0 against the alternative 

simple hypothesis H t \6 — Q n = \. Now 


L(0 ’； x] 


x n ) 


(1 ly/2n) n exp 


n 


一 W 2 


L(6 tf ; x, 


Xn) 


(i/^r 


exp 


n 


— ( 一 l ) 2 }/2 


if 


exp 


n 

2 


Ifk > 0， the set of all points (x t , x 2 , … ， x n ) such that 


exp 




< k 


is a best critical region. This inequality holds if and only if 


n 


n 


X ^ ^ In A: 


or, equivalently^ 


n 


n 


D x: > $ — In A: 


c, 


in this case, a best critical region is the set C = ，叉 2 , •… x„) : ^ x, > c|, 

where c is a constant that can be determined so that the size of the critical 

n 

region is a desired number ol. The event £ ^ > c is equivalent to the event 

X > cjn = C| , say, so the test may be based upon the statistic X, If H 0 is true, 
that is, 9 — 0'^ 0, then X has a distribution that is JV(0,1 /«)• For a given 
positive integer n, the size of the sample，and a given significance level a, the 
number c } can be found from Table III in Appendix B, so that 
Pr (X > Cj；// 0 ) = a. Hence, if the experimental values o(X u X l9 ..., X„ were, 

J * w 

respectively, jc 】， jc 2 , … ，文 „， we would compute ^ ^ Z x d^ If x > c ] 5 the 

- _ i 

simple hypothesis : 9 ^ 6 f = 0 would be rejected at the significance level 
a;i(x < c u the hypothesis H 0 would be accepted The probability of rejecting 
when H 0 is true, is a; the probability of rejecting when H 0 is false ， 
is the value of the power of the test at 0 = = 1. That is, 


Pr(X>c l ； H,) 




r i : 

r (x - if] 

fT / iy CXP 

2( _ 


dx. 


For example, if n = 25 and if a is selected to be 0.05, then from Table III 
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we find that c ( - 1,645/^= 0,329. Thu: 
against is 0,05, when H Q is true, and is 

Ox — l ) 2 




when H x is true. 


exp 


2 ❿ 


dx 


3.355 


^J2n 


e^ 2 dw = 0.999 + 


There is another aspect of this theorem that warrants special 
mention. It has to do with the number of parameters that appear in 
the p.d ， f. Our notation suggests that there is but one parameter. 
However, a careful review of the proof will reveal that nowhere was 
this needed or assumed* The p-dl. may depend upon any finite number 
of parameters. What is essential is that the hypothesis H 0 and the 
alternative hypothesis be simple, namely that they completely 
specify the distributions. With this in mind, we see that the simple 
hypotheses H 0 and H x do not need to be hypotheses about the 
parameters of a distribution, nor, as a matter of fact, do the random 
variables 不， 1 2 , .， •，尤 need to be independent. That is, if H 0 is the 
simple hypothesis that the joint p.d.f. is g(x u x 2 ,. - -, x n ) 9 and if H x 
is the alternative simple hypothesis that the joint p，dX is 

，文 2 ,… ， x„% then C is a best critical region of size a for testing H 0 
against H y if ? for k >0: 






x 




办文 3 ， . * j) 




for (x u ， … ， x„) e C 


2\ 


^(■^ 1 , Xi 


^n) 


> k for (X 】， jc 2 , … ， jO e C*. 


办(士，义2, .-_，〜） 

3' . x n ) e C; ' 

An illustrative example follows. 

Examples. Let■ , X n denote a random sample from a distribution 
which has a p.dX f{x) that is positive on and only on the nonnegative 
integers. It is desired to test the simple hypothesis 


: /(x) 


e 


jc = 0, 1 ’ 2, 


0 elsewhere, 


H 

against the alternative simple hypothesis 

x = 0 ， U2, 


= 0 . elsewhere 
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Here 


g(^i ， … ， A) e- n f(x t \x 2 l' - - x n \) 
h(x t . = A 

{2e^) n 2^ 


n 


n ⑽ 


Uk>0, the set of points (x u jc 2 ” * ” x n ) such that 


n 


Z ) In 2 — In 


tl 


n m 


< In A: — n In (2e^ 1 ) = c 


is a best critical region C Consider the case of A: — 1 and n—L The preceding 
inequality may be written 2 X] jx x \ < e/2. This inequality is satisfied by all points 

in the set C = {x s : x t = 0,3,4, 5, .，.}■ Thus the power of the test when H 0 
is true is 

Pr (X l eC;H 0 )^ l ^ Pr (JT, - 1,2;H 0 ) = 0,448 ， 

"pa, ' 

L l|' 

approximately, in accordance with Table I of Appendix B* The power of the 
test when is true is given by 




^t{X x eC;H x )= \ -Pr^-I^; //,) 

^ 1 - d + l) = 0.625. t 

Remark. In the notation of this section，say Cisa critical region such that 


a 


L{e f ) and 




L{&% 




so that here a and p equal the respective probabilities of the type I and type 
n errors associated with C, Let d } and d 2 be two given positive constants* 
Consider a certain linear function of a and namely 






W) + d 2 






L(0 ff ) = d x 


c* 


e 


J c 


ue f ) + d 2 






v 


c 


^2 + 


/i 


*/ C 


W W f ) - 必 1(0")]. 


If we wished to minimize this expression, we would select C to be the set of 
all (X| * — s x n ) such that 


d } L 料一 d 2 L(0 ff ) < 0 


or ， equivalently, 


im < d 2 

零 ) < 7 ， 


for all (x u x 2i …， 〜）e C ， 
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which according to the Neyman-Pearson theorem provides a best critical 
region with k = d 2 jd x • That is，this critical region C is one that minimizes 
d { a + d There could be others，for example, including points on which 
L(d f )jL{O ft ) = d 2 /di , but these would still be best critical regions according to 
the Neyman-Pearson theorem, ^ 

EXERCISES 

9*L In Example 2 of this section, let the simple hypotheses read 
H 0 : 9 — 6" — 0 and H t : 8 = 8 rf ^ — 1, Show thaUhe best test of H 0 against 
Hi may be carried out by use of the statistic X, and that if n — 25 and 
ol = 0.05, the power of the test is 0.999+ when H x is true. 

9-2. Let the random variable X have the p.d*f_ f(x; 8) = (l/9)e" xl0 ^ 
0 < jc < oo, zero elsewhere. Consider the simple hypothesis H^ :0 = 9' = 2 
and the alternative hypothesis H X :Q — 6 ft — 4. Let X 2 denote a 
random sample of size 2 from this distribution. Show that the best test 
of H 0 against Hi may be carried out by use of the statistic X x + X 2 and 
that the assertion in Example 2 of Section 6.4 is correct. 

9*3. Repeat Exercise 9.2 when H x :6 = 6 f/ — 6. Generalize this for every 

r > 2. ' * 

9.4, Let X u X 2y ，…， X l0 be a random sample of size 10 from a normal 
distribution iV(0, a 1 ). Find a best critical region of size ol = 0*05 for testing 
—l against ^ ，: <r 2 — 2. Is this a best critical region of size 0-05 for 
testing H 0 : g 2 — I against : a 2 = 41 Against : a 2 = a] > 1? 

9.5* If X 2i ^ > ,X n is a random sample from a distribution having 
p,dX of the form f(x; 6) = 6^~ ", 0 < x < 1, zero elsewhere, show 
that a best critical region for testing H o :0 — l against H } :9 = 2 is 

c = <(x N x 2 , …， 〜): c < n x, 

9,6. Let X u X 2 ,..., Zjo be a random sample from a distribution that is 
N($i f B 2 y Find a best test of the simple hypothesis — 0, 

$ 2 = $ 2 ^ l against the alternative simple hypothesis H { :6i — B*[ — 1, 

-& 2 ~ &2 = 4 . 

9*7. Let •… X n denote a random sample from a normal distribution 

N(0, 100). Show that C = |(jCt, x 2 ,..., x„): c < je = x ( /«| is a best criti- 

cal region for testing H 0 :9 = 75 against : 0 — 78. Find n and c so that 

Pt[(X 13 X 2 , ^.,X n )eC;H Q ] -Pr (X>c;H,) = 0.05 
and 

•. 4 等 、T 

Pr [(X {% A^) e C; H { ] - Pr (X > c; //,) = 0.90, approximately. 
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9.8* If X u X 2 ^ ，” ，尤 is a random sample from a beta distribution with 
parameters oc = 芦 =0 > 0, find a best critical region for testing H 0 : 6 = I 
against H { :0 = 2. 

9A Let d 2 , … ， X n denote a random sample from a distribution having 
the p_dX f(x; p) = p ^(I — p) 1 jc ，0, 1， zero elsewhere. Show that 

C = |(xt ,..., x n ) i^Xi < cl is a best critical region for testing H 0 ;p = j 

against H x : p Use the central limit theorem to find n and c so that 

approximately Pr ^ X s < c; H。) = 0.10 and Pr (Z X- ^ c; = 0.80. 

_ 

9J0. httX u X 2y …， Jf_ 0 denote a random sample of size 10 from a Poisson 

■籲 1,V 

distribution with mean 0. Show that the critical region Cdefined by ^ x； > 3 

* i 

is a best critical region for testing Hq ：6 = 0J against H x :6 — 0,5* 
Determine, for this test, the significance level a and the power at ^ = 0.5. 


9.2 Uniformly Most Powerful Tests 


This section will take up the problem of a test of a simple hypoth¬ 
esis H 0 against an alternative composite hypothesis We begin 
with an example. 

Example 1 . Consider the p.d.f. 

/(^； = | e~~ x/d 7 0 < x < oo ， 


= 0 elsewhere, 

of Example 2, Section 6.4, and later of Exercise 93,% is desired to test the 
simple hypothesis H 0 :B ^2 against the alternative composite hypothesis 
Hi：$> 2. Thus = {0 : 0 > 2}. A random sample, X x , X 2 , of size n = 2 will 
be used，and the critical region is C — {(x u x 2 ) : 9.5 < x } + x 2 < oo}. It was 
shown in the example cited that the significance level of the test is 
approximately 0.05 and that the power of the test when 9 = 4is approximately 
0.31. The power function K(0) of the test for all 8 >2 will now be obtained. 
We have 




J m 


^0 ^0 


e 2 


exp 


jc, + x 2 
6 


dx ' dx 2 


0 + 9.5 

e 





2 <e. 


For example, K{2) = 0*05, ^(4) = 031, and ^(9*5) - 2/e* It is known 
(Exercise 93) that C — {(x u x 2 ) : 9,5 < x { + x 2 < oo} is a best critical region 
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of size 0,05 for testing the simple hypothesis H Q :9 = 2 against each simple 
hypothesis in the composite hypothesis H X :Q> 2. 

The preceding example affords an illustration of a test of a simple 
hypothesis H 0 that is a best test of H Q against every simple hypothesis 
in the alternative composite hypothesis H l * We now define a critical 
region, when it exists，which is a best critical region for testing a simple 
hypothesis H 0 against an alternative composite hypothesis H } At seems 
desirable that this critical region should be a best critical region for 
testing H q against each simple hypothesis in //■. That is, the power 
function of the test that corresponds to this critical region should be 
at least as great as the power function of any other test with the same 

significance level for every simple hypothesis in 

- • . ， . - ■+ ， 、- . ^ - 

普 

Definition 2. The critical region C is a uniformly most powerful 
critical region of size a for testing the simple hypothesis H 0 against an 
alternative composite hypothesis H x if the set Cisa best critical region 
of size a for testing H 0 against each simple hypothesis in H } . A test 
defined by this critical region C is called a uniformly most powerful test, 
with significance level a s for testing the simple hypothesis H 0 against 
the alternative composite hypothesis H l . 

As will be seen presently, uniformly most powerful tests do not 
always exist. However, when they do exist, the Neyman — Pearson 
theorem provides a technique for finding them. Some illustrative 
examples are given here. 

Example Z Let X 2f denote a random sample from a 
distribution that is N(0 y 0), where the variance 6 is an unknown positive 
number. It will be shown that there exists a uniformly most powerful test with 
significance level a for testing the simple hypothesis H 0 : d — B\ where 6 A is a 
fixed positive number, against the alternative composite hypothesis 
H\ ： 6> 0\ Thus ft = {0 : 0 > O'}, The joint p.d.f. of X U X 2 ,... y X n is 



L (武 X| ， x 2 , …， xj = 


Let 6 ff represent a number greater than 6\ and let A: denote a positive number. 
Let C be the set of points where 
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that is, the set of points where 



rt /2 


exp 


L_ 




S k 


or，equivalently, 


17 


E -^>： 


2 ❹ f 


e f, - e f 


17 


Hf) 


I n A: 




The set C = p j 2 , …， U ^ 0 j is then a ⑹ region for 

tins the s ^P le ，她⑵ is against the simple hypothesis 0 ^ r 

It remams to determine c 3 so that this critical region has the desired size a' 

If flo is true, the random variable m has a chi, are distribution with ^ 

■ 

degrees of freedom. Since a = 


may be 


read from Table II in Appendix B and e determined. Then C- 



ft 


义 2 


■ > > c| is a best critical region of size a for testing 

a TV h f hyP ° theSiS 0 = r - M ^over, for each number r 
greater than G , the foregoing argument holds. That is, if Q m is another 

number greater than then C ={(,：,,.. cj is a ^ critical 

region of size a for testing against the hypothesis 0 = r. 

Accordingly, C = 卜， … ， x „) : is a uniformly most powerful 

critical region of size a for testing H 0 :d^9^ against H,:0>^ If 
x 2i .., ,x n denote the experimental values of y r r 
/f 0 ”: 9^e f is rejected at the significance level oc, and H^O > is accepted! 

if £ > c; otherwise, is accepted. ’ 

h J f ； J, n l be d^ussion, we take «= 15, a =0.05, and ^ = 3, then 

heretf ， hypotheses will be/f 0 : 0 = 3 and//, ： 0 > 3 . From Table II 
c/3 = 25 and hence c = 75, ’ 

E m ^P k ^ Let JT r … ， Jl ； denote a random sample from a 
distnbution that is N{Q, I), where the mean 0 is unknow/ It wili be 
shown that there is no uniformly most powerful test of the simple 

hypothecs i/ 0 : . = number, against the aUeZfve 

composite hypothesis H x .6 ^0, Thusfl — {0 : — oo < 0 < oo}. Let 9 f/ be a 
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number not equal to 9\ Let A ： be a positive number and consider 


{Ijlitf 12 exp 


n 


(A - 0 f ) 2 /2 


( 1/20 


If 


一 X ! (〜一 o rr y /2 


< k. 


The preceding inequality may be written as 


n 


n 


e X p 卜 (r - 00 + 音 [on 2 - (m k k 


or 




This last inequality is equivalent to 


In k 




provided that 6 ff > 8\ and it is equivalent to 


if O ft < 6\ The first of these two expressions defines a best critical region for 
testing 6 — 8' against the hypothesis 9 — 9 tf provided that d if > $\ while 
the second expression defines a best critical region for testing H 0 :d — 9 f 
against the hypothesis 6 — 6 n provided that 6 ff < 6\ That is, a best critical 
region for testing the simple hypothesis against an alternative simple 
hypothesis，say 6 — 6' + 1, will not serve as a best critical region for testing 
Ho：9 = 6 f against the alternative simple hypothesis 6 — d f — \^ say. By 
definition, then，there is no uniformly most powerful test in the case under 
consideration. 

It should be noted that had the alternative composite hypothesis been 
either H x :d> B f or H t :9 < d f , a uniformly most powerful test would exist 
in each instance, + 

Example 4. In Exercise 9.10 the reader was asked to show that if a 
random sample of size n — 10 is taken from a Poisson distribution with 

*0 

mean 0， the critical region defined by [ jc, 之 3 is a best critical region for 

i r * 

testing H q : 0 = 0 J against H { : 9 — 0.5. This critical region is also a uniformly 
most powerful one for testing H 0 :9 — 0,1 against H } :0 > OJ because, with 

r >0.1, , 

( 0,1 ) XXi e^ mA) f(xi ! x 2 l … x„!) 


In 


Ar 

6 


r 


I 

o 

+ 

(0 

n-2 

VI 
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L(6; x l3 x 2 , 


L\Q , $ ■.*，〜） 

* B i 

which depends upon x { , x 2i . *., x n only through 少 = u(x } ，々 ，_•” jc w ), 
b an increasing function of y ^ u{x x , x 2 ^ ■… ， x n ). In such a case 
we say that we have a monotone likelihood ratio in the statistic 
Y=u(X u Xi^^,XX 


is equivalent to 



Z Xf 


e -KH0.i-r) ^ ^ 


The preceding inequality may be written as 


@ (In 0.1 — In 0") < lnJfc + 10(0.1 - 6") 


or，since > 0J, equivalently as 


w 〉 Ui a +1 — i or 



In 0.1 - In & 


io 


Of course，^ > 3 is of the latter form, 

i 

Let us make an observation, although obvious when pointed out, 
that is important. Let X { , X 2y • •., denote a random sample from 
a distribution that has p,d_f ， f(x; 0), Qeil Suppose that 

F = u{X\ ， • " ， U is a sufficient statistic for 6. In accordance with 
the factorization theorem, the joint p.d.f. of X 〗， X 2 ,.. ” A ； may be 
written 

Xi y X 2 ^ ..•，々）= 灸 I [ W ( X !， 义2， ■ ■ . ，文”); 0] 灸 2( X |， 义2, •”，&)， 

, a ^ 

where k 2 (x u x 2 , .，.，〜）does not depend upon 9, Consequently^ the 
ratio 

L { Q ，义〖，又2， * * * 5 k ] y X2 , … • ， A ); 0 ] 

^ - 

j i ^2^ m * - ? ^n) [w(Xj, Xjf • * • ， A); f ] 

depends upon 々，々”•-，〜 only through */( 々， jc 2 , . ■. ，名 ). Accord- 
kgly，if there is a sufficient statistic Y = u(X x , X n ) for 6 and 

if a best test or a uniformly most powerful test is desired, there is no 
need to consider tests which are based upon any statistic other than the 
sufficient statistic. This result supports the importance of sufficiency. 
Often，when 0 rf < 0 f the ratio 


\/ 

X, 

) 

* 

■ 
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Example 5, Let X 2 , ^ be a random sample from a Bernoulli 
distribution with parameter p = 0 ， where 0 < 9 < L Let 6 ,f < Q\ Then the 
ratio 


寧 ; jc" jc 2 ，…， - ey -^ 

L(e tf ； x 〗， x 2 x„) = (r 产 (i — ey~^ 



Since r/0" > 1 and (! - e ff )/(l-9 / )> 1, so that ^(1 — $ f W(l - 00 > U 
the ratio is an increasing function of y = E x h Thus we have a monotone 
likelihood ratio in the statistic K = S . 


We can generalize Example 5 by noting the following. Suppose that 
the random sample X U X 2 ^ ^ , X n arises from a p.df. representing a 
regular case of the exponential class, namely 

f(x; 0) = exp [p{6)K(x) + S(x) + g(e)] f ， x 

* * 

= 0 elsewhere ， 


where the space sf of X is free of 6. Further assume that p(6) is an 
increasing function of 6. Then 


W) 一 

mmmm 

L(0 ff l . 


exp 


exp 


pm i K(xd + t S(xd + 

_ i=\ \ 

m 

P{&1 f K(xd + f S( Xi ) + nq(0 ft ) 

i — I / = i 


=exp - p(6 ff )] £ 取 )+ 4，) - ，)] j. 

■■■ ■ 

If 6 ,f < 9\ p(0) being an increasing function requires this ratio to be 

it 

an increasing function of y — Y, 欠 (〜)- Thus we have a monotone 

* I = i ^ . 

likelihood ratio in the statistic Y - Y, [( 不 ).Moreover, if we test 

• J 

Hq: 6 W against H } :6 < 6\ then, with 6 f, < Q\ we see that 

m <k 

Ufi n ) ~ 

is equivalent to £ K{x^) < c for every 6 n < $\ That is，this provides a 
uniformly most powerful critical region. 

If，in the preceding situation with monotone likelihood ratio, we 
test H 0 : 6 ~ 9 f against H X \B> 6\ then X ^(x,) > c would be a 
uniformly most powerful critical region. From the likelihood ratios 
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- • • » ' 

displayed in Examples 2, 3, 4, and 5 we see immediately that the 
respective critical regions 

X — X X i — Z > c ， ^ x f > c 

r ~ l * f^t #=l i—\ 

* ^ 零 ▼ r •• *- 

A M 

【■ _ ■ »• 

are uniformly most powerful for testing H 0 : 6 ^=0 / against H } :6> e\ 

There is a final remark that should be made about uniformly most 
powerful tests- Of course, in Definition 2, the word uniformly is 
associated with 6; that is, C is a best critical region of size a for testing 
H o :0 = $ 0 against all 6 values given by the composite alternative H t , 
However, suppose that the form of such a region is 

J r , .m 屬 -§■ 丨 

义 2， p . * ? ^ Cpt » : 

* -9 

Then this form provides uniformly most powerful critical regions for 
all attainable a values by, of course, appropriately changing the value 
of e. That is，there is a certain uniformity property, also associated 
with oc，that is not always noted in statistics texts, 

霞 ,- k »■ 

m *► •« + • 

EXERCISES 

•- T 

9*11, Let X have the p.d.f f(x; 6) — 0^(1 — 0 ) 1 一 ' x = 0, 1， zero elsewhere. 
We test the simple hypothesis 0 — ^ against the alternative composite 
hypothesis H 1 : 5 < | by taking a random sample of size ID and rejecting 
0 = ▲ if and only if the observed values x } , x 2 ,; • * 3 jc io of the sample 

10 : 

obs 琴 rvations are such that ^ x f - < 1 1 Find the power function K(6), 

0 < 5 S ]，of this test, 1 

9,12. Let JSfhave a p.dX of the form f(x; 0) = 1 /0,0 < x < 0 ， zero elsewhere. 
Let Y t < < Y 3 < Y 4 denote the order statistics of a random sample of 

size 4 from this distribution. Let the observed value of Y A be j 4 . We reject 
Hq : G = l and accept Hi: 0 ^ 1 if either 74 < ^ or ^ K Find the power 
function K($X G < 0 ， of the test. 

9J3, Consider a normal distribution of the form N(6, 4). The simple 
hypothesis : 0 — 0 is rejected, and the alternative composite hypothesis 
H) 1 d > 0 is accepted if and only if the observed mean 3 c of a random sample 
of size 25 is greater than or equal tof ； Find the power function K(0), 0 < 0, 
of this test. 

■ ， T ' ： I * ； 

9, 14 Consider the two norma! distributions iV(^ ， 400) and N(fi 2i 225). Let 
设 =Mi — Mi. Let x and y denote the observed means of two independent 
random samples, each of size n 3 from these two distributions* We reject 
H o :6 = 0 and accept H { : 0 > 0 if and only if 3c-^ If K(6) is the 
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power function of this test, find n and c so that K(0) - 0,05 and K(10)- 
0.90, approximately. 

9.15. If, in Example 2 of this section, H Q :6 — 6\ where Q f is a fixed positive 

number, and H t :B < d\ show that the set |(jc, , , x n ): x. < cl is 

a uniformly most powerful critical region for testing H 0 against /?,, 

9.16. If, in Example 2 of this section, H Q :0 = 6\ where 0' is a fixed positive 
number，and H { :9 ^ 6% show that there is do unifonnly most powerful test 
for testing H Q against Hi ■ 

9*17* Let X i9 $ X 2S denote a random sample of size 25 from a normal 
distribution N(9, 100). Find a uniformly most powerful critical region of 

size a = OJO for testing H^:d — 75 against H X :B > 75. 

* 

9.18* Let 不， 1 2 , ，• • ， denote a random sample from a normal distribution 
N{Q, i 6 )，Find the sample size n and a uniformly most powerful test of 
id = 25 against H { :6 <25 with power function K(6) so that 
approximately K(25) = 0,10 and K(23) ^ 0.90, 

9_19, Consider a distribution having a p.d.f. of the form f(x; B )= 
— x = 0, U zero elsewhere. Let // 0 : 0 ^ ^ and H l : 6 > Use 
the central limit theorem to determine the sample size /i ofa random sample 
so that a uniformly most powerful test of H 0 against has a power function 

with approximately K{^) = 0.05 and K(^) = 0.90* 

9*20, Illustrative Example I of this section dealt with a random sample of size 
n^2 from a gamma distribution with a = 1，芦 =0, Thus the of the 
distribution is (1 - Bty\ t < 1 / 0 , 0^2. Let Z = JT, + JT 2 , Show that Z 
has a gamma distribution with a = 2, 芦 =0. Express the power function 
AT( 0 ) of Example 1 in terms of a single integral. Generalize this for a random 
sample of size n, 

9.21* Let X[，• _ ■ ， be a random sample from a distribution with p.d.f_ 
f(x; 0) = Q^~ \ 0 < x < oo, zero elsewhere, where 0 > 0. Find a sufficient 
statistic for 9 and show that a unifonnly most powerful test of : 0 = 6 
against B <6is based on this statistic. 

9,22, Let X have the p_dX f(x; 6) = ^(1 — 0) 卜 ' 久 = 0, 1, zero elsewhere- 
We test H q : Q — \ against H { : ^<l by taking a random sample 

5 

X 2 , • •, ， Z 5 of size 5 and rejecting i/ 0 if K — ^ is observed to be 
less than or equal to a constant c. 1 

(a) Show that this is a unifonnly most powerful test. 

(b) Find the significance level when c — 

(c) Find the significance level when c - 0 . 

(d) By using a randomized test, modify the tests given in parts (b) and (c) 
to find a test with significance level a = 4 ‘ 
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9.3 Likelihood Ratio Tests 


The notion of using the magnitude of the ratio of two probability 
density functions as the basis of a best test or of a uniformly most 
powerful test can be modified，and made intuitively appealing, to 
provide a method of constructing a test of a composite hypothesis 
against an alternative composite hypothesis or of constructing a test 
of a simple hypothesis against an alternative composite hypothesis 
when a uniformly most powerful test does not exist. This method leads 
to tests called likelihood ratio tests. A likelihood ratio test, as just 
remarked, is not necessarily a uniformly most powerful test，but it has 

been proved in the literature that such a test often has desirable 
properties, 

A certain terminology and notation will be introduced by means 
of an example. 

Example h Let the random variable X be N{0 U 6 2 ) and let the parameter 
space be 12 = {d 沒 2 ) i < oo 5 0 < < 00 }* Let the composite 

hypothesis be : 6 [ = 0, > 0, and let the alternative composite hypothesis 
be /f,: ^ 0, $ 2 > 0. The set co = {(9 U 0 2 ) : 0, = 0, 0 < d 2 < 00 } is a subset 

of £1 and will be called the subspace specified by the hypothesis Then, for 
instance, the hypothesis H 0 may be described as H 0 : (0 U 8 2 ) eco. It is proposed 
that we test Hq against all alternatives in 

Let Uh … ， X n denote a random sample of size n>\ from the 
distribution of this example. Hie joint p.df of X u is, at each 

point in 

Z ( x i ~ ^i) 2 
J ~^ - = 準) • 

At each point (6 t , 0 2 ) g gj, the joint p.dX of X u X 2y ^ is 


— jL(oj). 

The joint p.d.f.，now denoted by L{a>\ is not completely specified，since 0 2 may 
be any positive number; nor is the joint p.d.f.，now denoted by Z(Q), 
completely specified, since 6 X may be any real number and 0 2 any positive 
number. Thus the ratio of L(co) to L(Q) could not provide a basis for a 
test of H 0 against Suppose, however, that we modify this ratio in the 
following manner. We shall find the maximum of L(m) in a>, that is, the 
maximum of L(a)) with respect to 0 2 . And we shall find the maximum of 


nil 


L(0 f d 2 ； 




2%0 2 


exp 


n 




29 , 


^ O2I jCj , ,«., x w ) 




2n0 2 


exp 
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L(Q) in that is, the maximum of L(£l) with respect to 9 } and The ratio 
of these maxima will be taken as the criterion for a test of H Q against 
Let the maximum of L(o>) in be denoted by £(co) and let the maximum of 
L(Q) in (2 be denoted by L(n)* Then the criterion for the test of H Q against 

4 

H x is the likelihood mtio 

. 文2， ■ ，，，叉 n ) — 义一 _ - ■ 

L(Q) 

Since L(co) and L(Q) are probability density functions, A > 0; and since o) is 
a subset ofCl, X< L 

In our example the maximum, L(c5), of L(co) is obtained by first setting 


n 


in L(o)) 


n 

dd 2 W 2 1 2^ 

n 

equal to zero and solving for 0 2 . The solution of 0 2 is ^ xf/n 9 and this number 
maximizes Thus the maximum is 1 




if/2 


L ㈣ 


2n £ ^fn 


exp 


JTC 


2^xf}n 




ne 


n/2 




On the other hand, by using Example 4, Section 6.!, the maximum, L(S2), of 

L(Sl) is obtained by replacing $ l and 0 2 by ^ xjn = x and ^ (x f — x) 2 /n, 
respectively- That is 1 1 


L(Cl) 




2n X (x t - x) 2 /n 


nfl 


exp 




n 


2 £ (Xi - xfjn 


ne 


2n £ (x t - x) 2 


nf2 


Thus here 






n 




«/2 
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H 


Breause X! 4 = Z (A — 幻 2 + 6 2 , 义 may be written 

i i 

, I 

^ In nW. 

nx 2 r£(Xi-x ) 2 


Now the hypothesis H Q is 6 } ― 0 5 0 2 > 0. If the observed number 3c were zero, 

tt 

the experiment tends to confirm H 0 . But if 3c = 0 and [ 4 > 0 , then X — K 

I n 

On the other hand, if x and rix 2 — x) 2 deviate considerably from zero, 
the experiment tends to negate H 0 . Now the greater the deviation of 



u 


^ (x, — x) 2 from zero, the smaller X becomes. That is, if X is used as a test 

criterion，then an intuitively appealing critical region for testing H 0 is a set 
defined by 0 < i where ^ is a positive proper fraction. Thus we reject 
H^ if k < Xq. A test that has the critical region i < Ao is a likelihood ratio test— 
In this example k<X^ when and only when 




(^i — 坪 /(" — 1 ) 



>J(n- l)(K 1/n — 1) = c. 


If // 0 ： = 0 is true, the results in Section 4.8 show that the statistic 

^(X-0) 


f(Ii， …， JQ 


n 


Z d — I ) 2 /( 汽 一 1) 


has a r-distribution with n — 1 degrees of freedom. Accordingly, in this 
example the likelihood ratio test of H 0 against H x maybe based on a T-statistic. 
Fora given positive integer /i. Table IV in Appendix B may be used (with /r 一 1 
degrees of freedom) to determine the number c such that 
a = Pr [\t(X x ， X 2 , …， X n )\ > c; H 0 ) is the desired significance level of the test. 
If the experimental values of X u X 2 ^ ^ y X n are ， respectively, x x 5 -^2 T * * * 3 ， 

then we reject H Q if and only if J/(X|, jc 2 , " *, x n )\ > c. If, for instance, n = 6 
and a = 0.05, then from Table IV, c = 2.57L 

The preceding example should make the following generaliz¬ 
ation easier to read: Let X n denote n independent ran¬ 

dom variables having, respectively，the probability density fimctions 
U ， … ， 6 m \ / = 1， 2, •. l The set that consists of ail par¬ 
ameter points ($，0 2 ”.. ， is denoted by fl，which we have 
called the parameter space. Let co be a subset of the parameter 
space n. We wish to test the (simple or composite) hypothesis 
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万 o: H …， D e o> against all alternative hypotheses. Define the 
likelihood functions 

n 

L { pS ) = 沒 1 ，沒 2 ， • ， • ， D ， （沒!，沒 2 ， ‘ ■ • ， D e ⑴， 


and 


n 


L (0) = Y \ fi( X h 沒2, " ■，沒 m )， （沒 I ， A ，• * ” D e n ， 


A p 

Let L(w) and L(Q) be the maxima, which we assume to exist, of these 
two likelihood functions* The ratio of L(S) to L(Cl) is called the 
likelihood ratio and is denoted by _ 


XiX\ , … ■ ， X n ) = yl 




L{m) 

L(d) 


Let ^ be a positive proper function. The likelihood ratio test principle 
states that the hypothesis H 0 : (9 lf 6 2 ^ .. y d m ) e w is rejected if and 
only if 


又 2 ， _ * • ， ) " A ^0 * 

The function X defines a random variable X(X { , &，•，•， X h \ and the 
significance level of the test is given by 

a = Pr [X{X\^ ..,, < AqI H 0 ]. 

The likelihood ratio test principle is an intuitive one. However, 
the principle does lead to the same test, when testing a simple 
hypothesis H 0 against an alternative simple hypothesis , as that given 
by the Neyman-Pearson theorem (Exercise 9,25). Thus it might be 
expected that a test based on this principle has some desirable 
properties* 

An example of the preceding generalization will be given. 

Example Z Let the independent random variables X and Y have 
distributions that are N(8' ， 9^} and N(8 2 , 6 ')，where the means 0, and 0 2 and 
commofi variance 0 3 are unknown. Then 12 = {(0 h 0 2 , 0 3 ); — oo < 0, < oo, 
— oo < < oo,0 < 0 3 < go}. Let X l7 X 2i •…， X a and Y u F 29 …” Y m denote 

independent random samples from these distributions. The hypothesis 
// 0 :0! = Q 2 y unspecified, and 8 y unspecified, is to be tested against all 
alternatives. Then m — {(0 lt 0 2 , 0 3 ) : — oo < 6i ^ 0 2 < oo, 0 < < oo}. Here 
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A ， •” ， A ， ， " ， K m are w 〉 2 nuitually independent random 
variables having the likelihood functions 


(« + m )/2 


£(o>) 


2n0 t 


exp 


JL 坩 


203 


-j 


and 


(fl + m)j 2 


L(Q) 




2n6^ 


exp 


ft 


I(^- 0 i ) 2 + £(^- 0 2 ) 2 

i _i__ 

203 


din L(co) 5 In Urn) 

- - - and -- 

洲 i d8 % 

are equated to zero, then (Exercise 9,26) 


n 


m 


Z ( x ^ — ^i) + Z (yi ^i) = o ; 


—(n + m) + 


^3 


it 




The solutions for 6 ] and 8 3 are, respectively, 

fi m 


U 


n + m 


and 


ft 


0 . 


Z ( x i ^ u f + £ (yt — u) : 


w 


n + m 


and u and w maximize L(co). The maximum is 

/ a— I V 17 + ^")/2 


€ 


2nw 


( 1 ) 


In like manner, if 


din L(Q) 5 In L(Q) din L(Q) 


dd' 


d0 2 


洲 3 
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are equated to zero, then (Exercise 9,27) 


n 


^) = 0 , 

I 


m 


X ( 乃一 02 ) = 0, 


—(n + m) + 


0, 


n 


m 




The solutions for 0 2 , and 6 y are, respectively, 


n 






n 


m 




m 


n - m 

X ( X i — ) 2 十 Z (少，一 ) 2 


W 


n + m 


and u u u l9 and w' maximize L{ti). The maximum is 

{w + m)l 2 

urn 


e 




2nw J 


so that 


乂 ( x , ， * *. ， ，_ yj ，’ " ，少町) 


L(Ci) 


⑸ 


The random variable defined l?y A 2/ ㈣ m * is 


m 


tiXr-^^ + tiYi- yf 


f { 足 一 \{nX + m Y)/(n+ m)]} 2 + £{/, -： [{nX + m Y)l(n + m)]} 2 


Now 


if 


m- 


nX + mY 

tt + m 



2 


n 


I 




IW - X) 2 + n[X 


n + m 


nX + mY 
n + m 


2 
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and 


m 


m 


nX±mY 

n + m 


m 


I 


m 


( d + F 


nX + mY 
n + m 


nX+mY 


2 


2 




But 


n X 


nX+mY 
n + m 


2 


nrn 


(n + m) 2 


{x-n 


and 


ml Y 


nX + mY 
n + m 


2 


n 


2. 


(n + mf 


(f — 


Hence the random variable defined by k 2j{n + m) may be written 

* _ m __ 


n 


m 


Z (A _ 办 + Z (K — y ) 2 + [nm/(/i + m)](X — Yf 


[nmjjn + m)](X ^ Yf 

1沭-办 + £(匕- f ) 2 


If the hypothesis H o :0 } — 0 2 is true, the random variable 


nm 


T 


n + m 


(^-f) 


iix^^+tiy-y) 2 


n + m — 2 


has，in accordance with Section 6.3, a 卜 distribution with n + m — 2 degrees 
of freedom. Thus the random variable defined by A 2/(n+m) is 


n + m — 2 
(n + m — 2) + T 11 * 

The test of against all alternatives may then be based on a /-distribution 
with n + m — 2 degrees of freedom. 

The likelihood ratio principle calls for the rejection of if and only if 
义 < 4 1 • Thus the significance level of the test is 

p 

cc = Pr [ 又 (X!，. " ， F| ， ， F m ) < Ao; // 0 ]* 
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However, k{X u X” Y { ,..,, Y m ) < ^ is equivalent to ⑺之 c, and so 

a = Pr (|71 ^ c; H 0 ). 

For given values of n and m ? the number c is determined from Table IV in 
Appendix B (with n + m —2 degrees of freedom) in such 这 manner as to yield 
a desired a. Then is rejected at a significance level a if and only if \t\ > c 5 
where t is the experimental value of T. If, for instance, n — 10, m = 6, and 
a = 0.05, then c = 2,145. 


In each of the two examples of this section it was found that the 
likelihood ratio test could be based on a statistic which, when the 
hypothesis H 0 is true, has adistribution. To help us compute the 
powers of these tests at parameter points other than those described 
by the hypothesis we turn to the following definitioiL 

Definition 3, Let the random variable fVbc N(d^ I ); let the random 
variable V be x 2 ( r X and W and F be independent. The quotient 


^ W 

T = p— _ 

is said to have a noncentral t’distribution with r degrees of freedom and 
noncentrality parameter S. If <5 — 0, we say that T has a central 
f-distribution. 


In the light of this definition, let us reexamine the statistics of the 
examples of this section* In Example 1 we had 

^/nXi<x 


n 


Here W,=^fn Xja is N{J^t i ) 5 F, Xfj^isx^n - 1 ), 


and and F, are independent. Thus, if 9 X ^ 0, we see, in accordance 
with the definition, that t{X u . has a noncentral /-distribution 

with n — l degrees of freedom and noncentrality parameter 

= yjn 61 /a. In Example 2 we had 



W 2 

^/v 2 /(n + m — 2) 
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where 


and 


Here W 2 is N[^/nm/(n + m)(6 l - 9 2 )j<j y I], V 2 is x\n and W 2 

and V 2 are independent. Accordingly, if 0 , # 0 2 , Thas a noncentral 
/-distribution with + /w — 2 degrees of freedom and nonccntralitv 

parameter S 2 = sjnmj{n 4 - m)(0 l — d 2 )iu. It is interesting to note that 
<5] - y/n BJa measures the deviation of 0 X from 0^0 in units of the 
standard deviation aj^/n of X. The noncentrality parameter 

^2 = ^/nm/(n + m)($t — Q 2 )j<T is equal to the deviation of $ — & 2 from 
0 } * 0 2 == 0 in units of the standard deviation a v n + m)jnm ofX — F, 
There are various tables of the noncentral /-distribution, but they 
are much too cumbersome to be included in this book. However, with 
the aid of such tables, we can determine the power functions of these 
tests as functions of the noncentrality parameters. 

In Example 2, in testing the equality of the means of two normal 
distributions, it was assumed that the unknown variances of the 
distributions were equal. Let us now consider the problem of testing 
the equality of these two unknown variances. 

Example 3. We are given the independent random samples X u X n 
and .. • ， F m from the distributions，which are N(9 t ,e 3 ) and N(6 2 , 8 4 ), 
respectively. We have 

。—{( 0 |，％， 03 ,沒 4 ) : _00 < 6 [, &2 < 00, 0 < @4 ^ 

The hypothesis H 0 : 6 y = 0 4j unspecified, with 仏 and also unspecified, is to 
be tested against all alternatives. Then 

o>= {(0 l7 $ 2 ,d^^)- -oo <&i ， 8 2 < oo,0 < 0 3 = 0 4 < oo}* 

It is easy to show (see Exercise 9-30) that the statistic defined by A = L(d))/£(Q) 
is a function of the statistic ' 

' i ： ( 不-奸 

F 二士一 --- • 

- n 2 /(m - 1) 


W 2 


nm 


n + m 


(X- Y) a 


n 


m 


E ⑶一好 +z(d 
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If 9j = 6 4 , this statistic F has an F-distribution with n — l and m — l degrees 
of freedom. The hypothesis that B lr d 4 ) e co is rejected if the computed 

F < C\ or if the computed F> The constants c, and c 2 are usually selected 
so that, if = 0 4i 


Pr (F < c,) = Pr(F^ ^ J, 

* 

where oc, is the desired significance level of this test- 


Often，under H Q , it is difficult to determine the distribution of 
2 = X{X x> X 2 ^ ■ • • ， X n ) or the distribution of an equivalent statistic 
upon which to base the likelihood ratio test. Hence it is impossible to 
find A such that Pr [X <, Xq ； H q ] equals an appropriate value of a. The 
fact that the maximum likelihood estimators in a regular case have a 
joint normal distribution does，however, provide a solution. Using this 
fact, in a more advanced course, we can show that —2 In X has, given 
//o is true，an approximate chi-square distribution with r degrees of 
freedom, where r — the dimension of Q — the dimension of a). For 
illustration, in Example I, the dimension of H — 2 and the dimension 
of (o ~ l and r == 2 — 1 = L 

Also, in that example, note that 

-■ 8 - 

~2 In X nln <1 H — nx _ i = wIn 

Hence, with n large so that x 2 /^ is close to zero under H 0 : 0, let 

us approximate the right-hand member by two terms of a Taylor's 
series expanded about zjqto: 

- 2hU»0 + 与 . 

r 



Since n is large, we can replace n by « — 1 to get the approximation 

2 

=A 

But T = Xj{S!yJ n — 1 ) under H 0 : 9 { =0 has a /-distribution with 
n — 1 degrees of freedom. Moreover, with large n — 1, the distribution 
of T is approximately A^(0, 1) and the square of a standardized normal 
variable is /(l), which is in agreement with the stated result. Exercise 
9*3 1 provides another illustration of the fact that —2 In X has an 
approximate chi-square distribution. 
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EXERCISES 

貧 . 

9,23. Id Example 1 let n — 10, and let the experimental values of the random 

■ # 一 !0 

variables yield x = 0-6 and ^ (x f — x) 2 = 3.6. If the test derived in that 

I 

example is used, do we accept or reject at the 5 percent 

significance level? 


' 8 

9*24* In Example 2 let n = m = 8 , 75.2, j? = 78.6, [ (x 广汙 = 7U 
早 O 广评 = 54.8* If we use the test derived in that example, do we accept 
or reject H 0 :6 } = 8 Z at the 5 percent significance level? 

P i 

9.25. Show that the likelihood ratio principle leads to the same test, when 

testing a simple hypothesis ff 0 against an alternative simple hypothesis H { , 

as that given by the Neyman - Pearson theorem. Note that there are only 
two points in fi. 

■ 4 

9-26* Verify Equations ( 1 ) of Example 2 of this section, 

9,27* Verify Equations ( 2 ) of Example 2 of this section. 

9.28, Let JST, ， Z 2 ，…， be a random sample from the normal distribution 
1% Show that the likelihood ratio principle for testing : Q = Q\ 
where 0* is specified, against H x :0 ^0' leads to the inequality |3c - 6 f \ > c. 
Is this a uniformly most powerful test of against H X 1 

9*29. I_et JTj, X 2y * * *, X n be a random sample from the normal distribution 
N(d lf 9 2 X Show that the likelihood ratio principle for testing H 0 :9 2 — d f 2 
specified, and 0, unspecified, against H y : unspecified, leads to a 

test that rejects when | ( Xi - 砰 or 丈 (x r — 砰 > c 2 ’ where c, < c 2 
are selected appropriately. 


9,30. Let y X n and Fj,,,. > Y m be independent random samples from 
the distributions N{Q U 0 3 ) and 卿 2 , ❹ 4 ) ， respectively. 

(a) Show that the likelihood ratio for testing : 0 t = $ 3 — 0 4 against 

all alternatives is given by 


If 


飞啦 fm 


E( x /- ^) 2 /« 」 L? ( 乂 — yfjm 


m/2 


X (々 — “) 2 + Z (妁- w ) 2 


(m + n) 


+ m)f2 


B 

where u^(nx + my)j(n + m). 
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(b) Show that the likelihood ratio test for testing H Q : 6y — By and 0 2 
unspecified, against 為手 0 4 ， d t and 0 2 unspecified, can be based on 
the random variable 

- - - . 

l(Y t ^ Y) 2 ((m - l) 

] 


(c) If 0 3 = 6 4 , argue that the F-statistic in part (b) is independent of the 
^-statistic of Example 2 of this section. 

9*31. Let n independent trials of an experiment be such thatx u x 2 , …， A are 
the respective numbers of times that the experiment ends in the mutually 
exclusive and exhaustive events A lt A 2 j ， … ， A k . If 為 ） is constant 

throughout the n trials, then the probability of that particular sequence of 
trials is L = pf l p 兰 1 … pfK 

(a) Recalling that + p 2 + — - + Pk = 1 ， show that the likelihood ratio for 
testing H 0 : p, = p m > 0,i — … ， k ，against all alternatives is given 

by 

x= f ] (^L 

- 'M l 


(b) Show that 


- 2 In A 




i XJXi 




^\2 


•J 


(nPi) 


where is between p 0i and x-Jn. 

Hint: Expand In p iQ in a Taylor’s series with the remainder in the 
term involving {p i0 — X/jn) 2 . 

(c) For targe /!， argue that x^np^) 2 is approximated by \l(np i0 ) and hence 

— 2 In A ^ Y ———， when H Q is true, 

ft) 啊 

In Section 6,6 we said the right-hand member of this last equation 
defines a statistic that has an approximate chi-square distribution 
with k — 1 degrees of freedom. Note that 4 

-»• i J ' ' m « 

dimension of Q — dimension of a) — (k — J) — 0 = 允 一1 • 

9.32. Let F| < Y 2 < * * * < Y s be the order statistics of a random sample of 

size ^ = 5 from a distribution with pAS.f(x; 9) — — oo < x < oo, 

for all real 0 . Find the likelihood ratio test X for testing — Oq against 
fi \ l S ^ ^ 0 ' ■ - * * ^ , 

9.33. Let X x , Jt 2 , … ， JT^and Y u K 2 , • • • ， Y m be independent random samples 
from the two normal distributions N(0, d t ) and N(0^ 0 2 ). 
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(a) Find the likelihood ratio X for testing the composite hypothesis 
H q :d t — 0 2 against the composite alternative : 

(b) This A is a function of what /^-statistic that would actually be used in 
this test? 

934. A random sample X [y X 2 

or 

H, : fix; 

u 

Determine the likelihood ratio (X) test associated with the test of against 

9.35, Let X and Y be two independent random variables with respective 
probability density functions 

f(x; 9,) - f|-J e~ x,9i , 0 < x < 00 ， 

zero elsewhere, i = 1,2. To test H 0 :6 { — 0 2 against H { :Q l Qi, two 
independent random samples of sizes and n 2> respectively, were taken 
from these distributions. Find the likelihood ratio X and show that X can 
be written as a function of a statistic having an F-distribution, under 

9.36, Consider the two uniform distributions with respective probability 
density functions 

b a. 

❹ d = -$ i <x<d h 

zero elsewhere, / = 1,2. The null hypothesis is H 0 :6i = 0 2 while 
the alternative is H { : ^ 6 2 . Let < X 2 < * * < X ni and 

Y t < Y 2 <‘ ■ ' *< Y„ 2 be the order statistics of two independent random 
samples from the two distributions ， respectively. Using the likelihood 
ratio X, find the statistic used to test against H x * Find the distribution 
of —2 In X when is true. Note that in this nonregular case the number 
of degrees of freedom is two times the difference of the dimensions of fl 
and a). - 

9.4 The Sequential Probability Ratio Test 

In Section 9J we proved a theorem that provided us with a 
method for determining a best critical region for testing a simple 
hypothesis against an alternative simple hypothesis* The theorem was 


，…， arises from a distribution given by 
0 < x < 0, zero elsewhere, 

0 < x < oo，zero elsewhere. 
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as follows. Let H …， 尤 be a random sample with fixed 
sample size n from a distribution that has p.d.f f(x; 0), where 
0e {9:0 ^ O', 6 tf } and 0 f and 9 tf are known numbers. Let the joint p.d-f. 
of X u X 2y * - *, X n be denoted by 

雕 《) =f(x x ； 0)/(x 2 ; e) ， . */(x n ； ex 


a notation that reveals both the parameter 6 and the sample size n. If 
we reject H 0 : d = 0' and accept H x : 6 = Q tr when and only when 

澤， n) 

L(K ， 

where k > 0, then this is a best test of H 0 against H \, 

Let us now suppose that the sample size n is not fixed in advance. 
In fact，let the sample size be a random variable N with sample space 
{n: n ^ 1 ， 2, 3” " }. An interesting procedure for testing the simple 
hypothesis H o :0 — 9' against the simple hypothesis H]：0 — 0 f/ is 
the following. Let k 0 and k } be two positive constants with k Q < k、? 
Observe the independent outcomes X {7 X 2j 1" 3 , " . in sequence, say 

and compute 

« 

L(0\ 1) L($\ 2) L(6\ 3) 

wwy L{e% %y 

_ ， 

The hypothesis H 0 : & = 0’ is rejected (and H { :0 = 0 f, is accepted) if 
and only if there exists a positive integer n so that (x u x 2 , ^ , x n ) 
belongs to the set 



x n ) : k 0 < 


W\D 

Wl) 


< — l. 


H — l. 


and 


L(6\ ri) 

n) 


< k 


o 


On the other hand, the hypothesis H 0 :6 - 6' is accepted (and 
6 = 6 rr is rejected) if and only if there exists a positive integer n so 
that ( 太】， a ，）belongs to the set 


B. 




Xn):ko< wi <kiJ 




1，2, 


ft ―1 ， 


and 


n) 

L(0\ n) 


^ k x 














See. 9A\ The Sequential PrababiHty Ratio Test 


427 


That is, we continue to observe sample observations as long as 


k 0 < 


n) 

L(6\ n) 


< A :卜 


We stop these observations in one of two ways: 


⑴ 


1 + With rejection of H Q : 6 == 6' as soon as 


L{Q\ n) 
n) 


S A ： o, 


or 

2 - with acceptance of H 0 : 6 ^ 6' as soon as 


L{6\ n) 

Z(0VO 


> k { . 


A test of this kind is called Wald’s sequential probability ratio test 
Now, frequently inequality (1) can be conveniently expressed in an 
equivalent form 


c 0 (/i) < u{x u x 2 , ,..，〜）< c { (n\ 

where u(X l ? X 2y ， … ， X n ) is a statistic and c 0 (/i) and c, (n) depend on the 
constants fe 05 k u 0', $ f \ and on n. Then the observations are stopped 
and a decision is reached as soon as 

, x n ) < c Q (n) or u(x } , x 2 ^ ^ > c t (n). 

We now give an illustrative example. 

Example L Let X have a p.d.f. 

/(； c ； 0 ) ^ 0^(1 H jc = 0，1, 

= 0 elsewhere. 

In the preceding discussion of a sequential probability ratio test, let : 0 = | 
and B\: 0 = then，with ^ jc, — x h 


#f ， 均 = d 产 dr … 


If we take logarithms to the base 2, the inequality 



厶 (i ， 打 ) 

， n) 


< A ， 
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with 0 < A: 0 < k u becomes •: 

* i * m ¥ m 

n 

Iog 2 k 0 <n-2^Xi< log 2 ku 

i 

* V 

or, equivalently, 

■M 1 w ft \ 

Coin) = 2^2 Iog2 < S < 2 " 2 log2 ^ = Ci ( w )， 

Note that L(] T n)jL^ n) < k Q if and only if ^(«)<^^; and £(}, n)j 
L(^ n) > k x if and only if c 0 (n) > ^ x h Thus we continue to observe 

« I 

outcomes as long as c 0 (n) < ^ ^ < c, (n). The observation of outcomes is 

* ■ ♦ I « 

discontinued with the first value n of N for which either c t (n) < or 

n n j 

c Q (n) > ^ x h The inequality Ci(n) < ^ x { leads to the rejection of 丑❶ ：0 = ! 

I I n j 1 

(the acceptance of and the inequality c 0 (n) > x f leads to the acceptance 

i 

of ifo: 0 = I (the rejection of H } ). 

* 

Remarks. At this point，the reader undoubtedly sees that there are many 
questions that should be raised in connection with the sequential probability 
ratio test. Some of these questions are possibly among the following: 

I 働 * 

L ， t • ••屬 

L What is the probability of the procedure continuing indefinitely? 

% What is the value of the power function of this test at each of the points 
0 = 0^ and 6 = 0 f l 

3. If 0〃 is one of several values of B specified by an alternative composite 
hypothesis, say H t : 6 > 0\ what is the power function at each point 6 > d’? 
4 Since the sample size N is a random variable, what are some of the 
properties of the distribution of NTln particular，what is the expected value 
E(N) of N7 

5, How does this test compare with tests that have a fixed sample size n? 

A course in sequential analysis would investigate these and many other 
problems. However，in this book our objective is largely that of acquainting 
the reader with this kind of test procedure. Accordingly, we assert that the 
answer to question I is zero. Moreover, it can be proved that if 0 = or if 
0 — 9 f \ E(N) is smaller, for this sequential procedure, than the sample size of 
a fixed-sample-size test which has the same values of the power function at 
those points. We now consider question 2 in some detail. 

In this section we shall denote the power of the test when H Q is 
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true by the symbol a and the power of the test when Hi is true by the 
symbol 1 — Thus a is the probability of committing a type I error 
(the rejection of H 0 when H 0 is true)，and 芦 is the probability of 
committing a type II error (the acceptance of when is false). With 
the sets C„ and B n as previously defined, and with random variables of 
the continuous type, we then have 


a 


I 


L ( 0 \ 1 — 芦 = 艺 






L(T，nl 


c„ 


Since the probability is I that the procedure will terminate, we also 
have … 



渾， n )， 

4 



L(6\ n). 


If (A， x 2 , * ^ y x n ) € C m we have L{Q\ n) < k 0 L(6 f \ n); hence it is clear 
that 


a 


□0 


E 


m 


m 


相 e 


Cn 




k 0 L(6\ n) - *o(l - PI 




Because Li6\ n) > k x L{9 f \ n) at each point of the set B n , we have 


’气 ?i 


k\ L{6'\ n) = 




Accordingly, it follows that 


oc 


—P 


<K, k A <, 


a 


( 2 ) 


provided that is not equal to zero or 1. 

Now let (x a and ^ be preassigned proper fractions; some typical 
values in the applications are 0*01, 0*05, and 0.10. If we take 






then inequalities (2) become 


免 i 




札， 


a 


< 




-P 1 一札 




CC a ^ 1 — oc 
— ^ -■ 


( 3 ) 


or，equivaiently， 


ot(i _ 凡 ) < a— 獻 ，： m _ a <a - a)fi a 
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if we add corresponding members of the immediately preceding 
inequalities, we find that . 

a + P 一 札一 < a a + — ^cc a — afi a 

and hence v 


a 十 fi 幺 oc 0 + 

That is, the sum a + ^ of the probabilities of the two kinds of errors 
is bounded above by the sum a a + of the preassigned numbers. 
Moreover, since a and p are positive proper fractions, inequalities (3) 
imply that 


a < 




扎， 








consequently, we have an upper bound on each of a and p. Various 
investigations of the sequential probability ratio test seem to indicate 
that in most practical cases, the values of a and are quite close to ol c 
and This prompts us to approximate the power function at the 
points 6 = Q f and $ — 6 f, by and 1 — 凡， respectively, 

. t 

Example!. Let Xbe N(6, 100), To find the sequential probability ratio test 
for testing : 0 = 75 against H { : 0 = 78 such that each of a and p is 
approximately equal to 0 」 0 , take 

t _ o.io i , i 一 aio n 

^0 = 1 a in = n * 众 1 = ~ - =9. 


0J0 9 s 


a io 


Since 


l(75 ， n) — ex P [— I (4 一 75) 2 /2(100)] _ , 6 ^> 广 459n\ 

[(7M) _ exp [-I(^- 78) 2 /2(I00)i - eXp ( 200 ) 


the inequality 


知+聲 <9= ,, 


1(78, it) 

can be rewritten, by taking logarithms，as 

6 一 459n 

-In 9 < ^ 1 ~ - < In 9. 

200 

This inequality is equivalent to the inequality 

c 0 ⑻ =fn 一 f in 9 々 < fn + ^In9 忠 c,(n). 


Moreover, £(75, n)/L(78 T n) <, k 0 and L(75, k)/L( 78, w) ^ Jfe, are equivalent 
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月 n 

to the inequalities ^ x i ^ c \ (^) and ^ x f < c 0 (n), respectively. Thus the 

^ i i 

observation of outcomes is discontinued with the first value not N for which 

n 

either 2 ] > c } (n) or ^ < c 0 (n). The inequality ^ x, > c t (n) leads to the 

. 1 1 n 1 

rejection of : 0 = 75, and the inequality < c Q (n) leads to the acceptance 

of H 0 :6 = 75. The power of the test is approximately 0.10 when H 0 is true ， 
and approximately 0.90 when is true. 

Remark. It is interesting to note that a sequential probability ratio test can 
be thought of as a random-walk procedure. For illustrations, the final 
inequalities of Examples I and 2 can be rewritten as 

一 log 2 k x < £2(^ - 0*5) < -Iog 2 k Q 

1 

# 

and 

In 9 < f (jc, - 76.5) < In 9, 

respectively. In each instance, we can think of starting at the point zero and 
taking random steps until one of the boundaries is reached. In the first 
situation the random steps are 2{X x — 0.5) ， 2 ( 馬一 0.5), 2(X 3 - 0.5)， … and 
hence are of the same length ， 1， but with random directions. In the second 
instance, both the length and the direction of the steps are random variables, 
X x - 76.5, X 2 _ 76.5, X 3 - 76.5,,, … 

In recent years, there has been much attention to improving quality 
of products using statistical methods* One such simple method was 
developed by Walter Shewhart in which a sample of size n of the items 
being produced is taken and they are measured, resulting in n values. 
The mean x of these n measurements has an approximate norma) 
distribution with mean ft and variance a 2 /n. In practice, // and a 2 must 
be estimated, but in this discussion, we assume that they are known. 
From theory we know that the probability is 0.997 that x is between 


LCL = ft 


3a 


and 


UCL = fi + 


3a 

7 ^ 


These two values are called the lower (LCL) and upper (UCL) control 
limits, respectively- Samples like this are taken periodically, resulting 
in a sequence of means, say X| ,3c 2) 3c 3> …. These are usually plotted; 
and if they are between the LCL and UCL, we say that the process 









432 


Theary of Statistical Tests [Ch, 9 


is in control. If one falls outside the limits, this would suggest that the 
mean pi has shifted, and the process would be investigated 

It was recognized by some that there could be a shift in the mean ， 

say from // to ju 4- {^ly/n); and it would still be difficult to detect that 
shift with a single sample mean as now the probability of a single x 
exceeding UCL is only about 0*023. This means that we would need 
about 1/0.023 ^ 43 samples, each of size n, on the average before 


detecting such a shift. This seems too long; so statisticians recognized 
that they should be cumulating experience as the sequence 
m … k observed in order to help them detect the shift sooner. 


It is the practice to compute the standardized variable Z = (X — 

^/n)\ thus we state the problem in these terms and provide the solution 


given by a sequential probability ratio test. 

Here Z is N(0^ 1), and we wish to test H Q :6 ~ 0 against :6 = l 
using the sequence of ii.d random variables Z, ， Z 2 ” • ” Z m ” . .. We 
use m rather than n, as the latter is the size of the samples taken 
periodically. We have ■ 


^( 0 , m) 


exp [-工 zJ/2] 

Uhm) ^exp [-[ (X - 1 f 12] 


exp 


(z，— 0_5) 


Thus :. ' 

k 0 < exp — (Zi — 0,5) < k { 

■ . I * 

_ / = I 

can be rewritten as 

m 

h ~ — In 灸 Q 〉 [ (z/ — 0-5) > ~" In k\ = — h* 

i=i 

It is true that —In k 0 = In k l when a a — 凡， Often, h — — In is taken 
to be about 4 or 5, suggesting that ol q — ^ is small, like 0.01. As 
2) {z t — 0,5) is cumulating the sum of z, — 0.5, / = 1 ， 2, 3,…， these 
procedures are often called CUSUMS. If the CUSUM ^ 2 (z,- — 0.5) 
exceeds A, we would investigate the process, as it seems that the mean 
has shifted upward. If this shift is to 0 = 1, the theory associated with 
these procedures shows that we need only 8 or 9 samples on the average, 
rather than 43 5 to detect this shift. For more information about these 
methods, the reader is referred to one of the many books on quality 
improvement through statistical methods. What we would like to 
emphasize here is that，through sequential methods (not only the 
sequential probability ratio test), we should take advantage of all past 
experience that we can gather in making inferences* 
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EXERCISES 

9.37. Let X be N(O t 9) and, in the notation of this section, let O' = 4 , Q f, — 9^ 
^ ~ 0*05, and — 0J0. Sho^w that the sequential probability ratio test can 
be based upon the statistic Determine c Q (n) and q ⑻. 

9.38. Let X have a Poisson distribution with mean Q. Find the sequential 
probability ratio test for testing H 0 : 6^ 0.02 against //! : 0 = 0,07. Show 

that this test can be based upon the statistic 交足 、 If 0.20 and = 0 」 0 , 
find c Q (n) and c { («), 1 

9.39. Let the independent random variables Y and Z be N(jx y , 1 ) and N(fi 2 , 1 ), 
respectively. Let 6 — ft } — fi 2 . Let us observe independent observations 
from each distribution, say Y u Y 2 , *.. and Z u Z 2y • • • • To test sequentially 
the hypothesis H 0 : $ — 0 against ff } : 6 — ^ use the sequence X f — Y { — Z h 

■ 二 . If ^ = 0*05，show that the test can be based upon 

^ — y — Z. Find c 0 (n) and c* (n), 

9A0. Say that a manufacturing process makes about 3 percent defective 
items, which is considered satisfactory for this particular product. The 
managers would like to decrease this to about 1 percent and clearly want 
to guard against a substantial increase，say to 5 percent. To monitor the 
process, periodically n ^ 100 items are taken and the number Xof defectives 
counted. Assume that X is b(n = 100^ = ff). Based on a sequence 
不 ， —- 5 尤 ， …， determine a sequential probability ratio test that 
tests 0.0 1 against = 0,05. (Note that $ = 0.03, the present 

level, is in between these two values,) Write this test in the form 

m 

办 0 〉 [ (1/ — 故 ^) > /^ 

i~ \ 

and determine d, h 0i and h t if ot a = = 0 . 02 - 

■ 

9.5 Minimax ， Bayesian, and Classification Procedures 

雩 

In Chapters 7 and 8 we considered several procedures which may 
be used in problems of point estimation. Among these were decision 
function procedures (in particular, minimax decisions) and Bayesian 
procedures. In this section，we apply these same principles to the 
problem of testing a simple hypothesis i/ 0 against an alternative simple 
hypothesis H x * It is important to observe that each of these procedures 
yields，in accordance with the Neyman—Pearson theorem, a best test 
of ff 0 against . • 
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We first investigate the decision function approach to the problem 
of testing a simple hypothesis against a simple alternative hypothesis. 
Let the joint p.d ， f. of n random variables % ， 石， •" ， 尤 depend upon 
the parameter Here nisa fixed positive integer. This p.d.f.is denoted 
A， …， xj or，for brevity, by L(6). Let 9 f and 0 ft be 
distinct and fixed values of 6, We wish to test the simple hypothesis 
H 0 :6 — 8" against the simple hypothesis H X :Q ^ B f, . Thus the 
parameter space is O ^{e：e = 0\ 0 ,f }. In accordance with the decision 
function procedure, we need a function S of the observed values of 
X! ， . • ” 尤 （ or，of the observed value of a statistic Y) that decides which 
of the two values of 0, 0 / or to accept. That is, the 
function S selects either H 0 : 0 = O' or :9 ^ 6 f, \ We denote these 
decisions by 3 = 6 f and S = 0\ respectively. Let if(0, S) represent the 
loss function associated with this decision problem* Because the pairs 
(0 = 6\d = 6 f ) and (8 — 6 f \ S — Q tf ) represent correct decisions, we 
shall always take ^(0\ 0 f )= 夕，， r) = 0. On the other hand，if 
either S — 9 ff when 0 ^ ox d = Q* when 9 = $ f \ then a positive value 
should be assigned to the loss function; that is ， 0 ff ) > 0 and 
輝 0‘ . 

It has previously been emphasized that a test orH o :6 = 0 / against 
Hi : $ = Q ft can be described in terms of a critical region in the sample 
space. We can do the same kind of thing with the decision function. 
That is, we can choose a subset C of the sample space and if 
0^1 ， A，• * *, x n ) e C, we can make the decision <5 = 0 fr ; whereas，if 
x iy * • * ， x n) ^ C*, the complement of C, we make the decision 
<5 = 0: Thus a given critical region C determines the decision function. 
In this sense, we may denote the risk function by 雕 Q instead of 

及 (0 ， <5), That is，in a notation used in Section 9.1 ， 

， 

^0, Q = R(0, S) = I if ⑼ S)L(d). 

Since S ― 6 U if(x { ” … ，； c„) e Cand d = C*, we have 

■ • ' m 

雖 Q = 聊， e ff )L{Q) + 聯， e f )L{ey ⑴ 

* i 

If, in Equation (1), we take 6 = $\ then &) - 0 and hence 

f* M 

雕 ,c)= 分 w ， e rt )L(e f ) = ^(e\ r) 1(0% 
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if in Equation ⑴ we let ㈢ "加寧 ， 0") = 0 


R(O r \ Q 


r c* 


邓 ' e f )L(e tf ) = 琴 w) 


/* 




L{6 tf ). 


c* 


e n = a =^ h n e 卿—⑽杨咖 

雕 ， O = 零， r 剛 = 现 r ) a ， 
where a = K(9 r ) is the significance level; and 

C) = ^(8 n , ^)[I — AT(0 ,/ )] = 0') 於， 

where fi=\ - K{e ,f ) is the probability of the type II error 

Xh^i minimax elution to our problem, 

i nat is，we want to find a critical region C so that 

max [R(Q\ Q 5 零， C)] 

is minimized. We shall show that the solution is the region 






—<k 


" sdected 80 ^ ^ ° ^ 




r> 


J c 


L{B f ) - 零， f) 


/* 


L(e f % 




^ the critic ^ reg f^ C P ro vides a minimax solution. In the case of 
that 7^0 ^ tyP<；5 k Can a — S 以 selected so 

上 hat ，， Q = 聯， Q. However, with random variables of the 
discrete type, we may need to consider an auxiliary random exner- 

iment when = k in order to achieve the exact eaualitv 

R(e\ Q = R(Q\ Q. me exacl quality 

To see that this region C is the minimax solution, consider every 

Piously, a region^ for 

fo^ t hen ? 2 W: n0 A a Candid " te f ° r a minimax Soiution ， 


se{& y e ff ) 






L{e f ) > 釋 n 


c 


V 




w% 


A 
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we have 


a 


L(e f ) > 






L(ey 


That is，the significance level of the test associated with the critical 
region A is less than or equal to a* But C y in accordance with the 
Neyman-Pearson theorem，is a best critical region of size a. Thus 


L{0 ff ) > 




L(r) 




and 




L(e ,r ) < 




c* 




A* 


Accordingly, 


翊 "， 00 


L(Q n ) < se{e\ 6 f ) 






L{&% 


A* 


or, equivalently- 


R(9\ Q < R(0\ Al 


That is, 


R(6\ O = R(0\ Q < R{0\ A). 


This means that 


max [R(0\ Q, R(6\ Q] < R(6% A). 


Then certainly, 

max [R(6\ Q, R(9\ Q] < max [R(0\ A% R(0\ A)l 

and the critical region C provides a minimax solution, as we wanted 
to show. 

Example L Let X Xy X 2 , .. X m denote a random sample of size 100 from 

a distribution that is N(8, 100). We again consider the problem of testing 
= 75 against H l : 8 = 78. We seek a minimax solution with 
JSf(75, 78) = 3 and 义 (78, 75) = L Since L(75) / £(78) < A: is equivalent to 
3c > c, we want to determine c, and thus k 7 so that 

i 

3Pr (X>c;0^75) = Pr (f < c ； = 78). 

•i 
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Because X is N(9, 1), the preceding equation can be rewritten as 

- 卿 - 75 ) 卜 $( c 一 78 ). 

If = 職 Table III of the appendix, we see, by trial and error, that the 

H f approximately. The significance level of the test is 

if 7° 36 n 冗 fthe power ofthe test when ^ is true 

is I — 0>(—L2) = 0 』 85, approximately. 

Next, let us consider the Bayesian approach to the problem of 

th C le H o ： e^0^ against the simple hypothesis 

■: 6/ = 沒 • continue to use the notation already presented in this 
sectK^j . In addition, we recall that we need the pAS. h(6) of the random 

SmCe the parameter s P ace consists of but two points 
a d ° m T able of the discrete type; and we have 

f f V f (;〜〜.• • ， a )= 增 is the conditional 
1 O 1 ， 2 ,…， 1 ^ glven &==0 , the joint p.dX of JT* ， JT 2 ，…， X 


h($)L(0 ； x t7 x 2j = h(0)L(0l 


Because 


I ： mm 

Q 




A(r_') + h(e ,f )L(e ff ) 


is the marginal p.d.f of X U X 2 , ^ the conditional p.d,f of © 
given X x =x u .^ y Xn^ x n , is 

h(9 f )L{6 , ) + h(r)L(0") • 

Now a Bayes’ solution to a decision problem is defined in Section 
8.1 as a such that E{^ S(y)]\Y^y} is a minimum. In this 
problem if the conditional expectation of 琴， S), given 

~■ I .， * ， * ， x 攻， is 

z 观 0' 师,,… 導胃） 
n h{e f )L{Q , ) + A(r) ⑽ , ，)’ 

because d r ) = 0; and if ,5 = d\ this expectation is 

r 熱， …，; O = _ 渾） 

11 h{B f )L{Q f ) + h($ ,, )L(e , ') * 
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because ^(6 f \ 0 tf ) = 0- Accordingly, the Bayes’ solution requires that 
the decision 6 — 6 ,r be made if 

潭， e ,f )h{e f )L(e f ) ) 釋 , ， r)_ v )L(r) 

H6 7 )W) + h(& f )L{& f ) < h{0 f )L{&) + h(0 ff )L(0 n ) ’ 

or, equivalently, if 

零，作 ( r ) 

L(0 rt ) ~^(d\ r)A(0o. {2) 

If the sign of inequality in expression (2) is reversed, we make the 
decision S — 6 f ; and if the two members of expression (2) are equal, we 
can use some auxiliary random experiment to make the decision* It is 
important to note that expression (2) describes, in accordance with the 
Neyman-Pearson theorem, a best test. 

Example 2* In addition to the information given in Example 1 ， suppose 
that we know the prior probabilities for 0 W = 75 and for 0 = 俨 = 78 to 

be given, respectively, by A(75) = |and A(78) = f Then the Bayes’ solution is ， 
in this case, 


£(75) ^ (l)(f) ^ 

which is equivalent to x > 76.3, approximately. The power of the test when 
i/ 0 is true is 1 — 0(1.3) = 0*097, approximately, and the power of the test when 
is true is 1 — #(— L7)= 少 (1.7) = 0.955, approximately. 


In summary, we make the following comments. In testing the 
simple hypothesis H o ：0 = O' against the simple hypothesis ff t :6 = B n y 
it is emphasized that each principle leads to critical regions of the form 





• . ， *^r) 


攀 ; A, …，; 个 


where ^ is a positive constant. In the classical approach, we determine 
A: by requiring that the power function of the test have a certain value 
at the point 0 = B f or at the point Q = 0 rt (usually, the value a at the 
point 6 == 0"), The minimax decision requires k to be selected so that 


^( 6 \ r > 




L(9 f ) - r) 


c 


L(9 ff y 




Finally, the Bayes 1 procedure requires that 


^{6\ e f )h(B tf ) 
= se{e\e ft )h{e f ) 
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Each of these tests is a best test for testing a simple hypothesis 
H 0 : 0 — O' against a simple alternative hypothesis H } : 0 = 6 rf . 

The summary above has an interesting application to the problem 
of classification ，which can be described as follows. An investigator 
makes a number of measurements on an item and wants to place it into 
one of several categories (or classify it). For convenience in our 
discussion, we assume that only two measurements, say X and F, are 
made on the item to be classified. Moreover, let X and Y have a joint 

h 

p，dX f(x, y; 0)，where the parameter 6 represents one or more 
parameters. In our simplification, suppose that there are only two 
possible joint distributions (categories) for X and F, which are indexed 
by the parameter values O' and 0 f \ respectively. In this case，the problem 
then reduces to one of observing X — x and Y = y and 
then testing the hypothesis 9 = 0' against the hypothesis 0 = with 
the classification of X and Y being in accord with which hypothesis is 
accepted，From the Neyman-Pearson theorem, we know that a best 
decision of this sort is of the form: If J 

f{x, y; Q f ) 

~ - ~ < k, 

fix, y; r) - 

choose the distribution indexed by 6 ff ; that is，we classify (x,y) as 
coming from the distribution indexed by 0 l \ Otherwise，choose the 
distribution indexed by 6 f ; that is，we classify (jc, y) as coming from the 
distribution indexed by Here k can be selected by considering the 
power function, a minimax decision, or a Bayes’ procedure. We favor 
the latter if the losses and prior probabilities are known. 

Example 3. Let (x, y) be an observation of the random pair 7)，which 
has a bivariate normal distribution with parameters 叫， /i 2 , a], <r 2 , and p. In 
Section 3.5 that joint p.cLf. is given by ’ 


/( 叉， P) 




2na { (T2^/l — p 






oo < jc < oo, —oo<y<oo t 


where q > 0, cr 2 > 0，— i < p < 1， and 

1 / X 一 £Zi\] 

分 ( x ， 


—P 2 


x —"’、- 2 p n 、~- 吣 


江 j 




汀 2 


<^2 


2 


Assume that a], a\, and p are known but that we do not know whether the 
respective means of (X, Y) are (〆，or (〆’ ， /4'} ‘ The inequality 




















440 


Theory of Statistical Tests [Ch. 9 


is equivalent to 

J ； ^ 2 ) - y; fil fi^)) < In t 

Moreover, it is clear that the difference in the left-hand member of this 
inequality does not contain terms involving r 2 , xy y and y 2 . In particular, this 
inequality is the same as 

1 [U-K PifJii - K) 

1 -/? 2 1 L <j] <f x g 2 

<lnk + i [ 穿 ( 0 , 0 ; ‘ / 4 ) - 祝 0 , 0 ; < ， (4)1 ( 3 ) 

or, for brevity, 

ax ^ by < c. 

That is, if this linear function of x and y in the left-hand member of inequality 
(3) is less than or equal to a certain constant，we would classify that (jc, j;) as 
coming from the bivariate normal distribution with means and 的 . 
Otherwise, we would classify (x^y) as arising from the bivariate normal 
distribution with means 〆 and Of course, if the prior probabilities and 
losses are given, k and thus c can be found easily; this will be illustrated in 
Exercise 9,43. 



Once the rule for classification is established, the statistician might 
be interested in the two probabilities of misclassifications using that 
rule. The first of these two is associated with the classification of (x, y) 
as arising from the distribution indexed by 0 ff if, in fact, it comes from 
that index by 8\ The second misdassification is similar, but with the 
interchange of 0 f and In the preceding example, the probabilities 
of these respective misclassifications are 

^Pr (aX + bY<c; and Pr (aX + bY > c; 

Fortunately，the distribution of Z ^ aX + bYis easy to determine, 
so each of these probabilities is easy to calculate. The m ， g.f, of Z is 

E{e tZ ) = E[e i(aX ^ br) ] = E{e atX ^ btY ). 

..警 

Hence in the joint m.g.f. of and Y found in Section 3*5, simply replace 
t\ by at and t 2 by bt to obtain 

阶， exp 卜 , +_+ ㈣ 2 + 2 1_ + 咖 )2 ~ 

_ 

f x {a 2 a] + 2abpa } a 2 + h 2 a\)t 2 ^ 

=exp (afi } + bfi 2 )t + - ~~ 2 -- . 
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However, this is the m.gJ. of the normal distribution 


N(afii + hpL 2 y + 2abpa t a 2 + b 2 a\). 

s 

With this information, it is easy to compute the probabilities of 

misdassijfications 3 and this will also be demonstrated in Exercise 9.43. 

■ * 

One final remark must be made with respect to the use of the 
important classification rule established in Example 3, In most 
instances the parameter values and as well as aj, and 
p are unknown. In such cases the statistician has usually observed a 
random sample (frequently called a training sample) from each of the 
two distributions. Let us say the samples have sizes n' and n f \ 
respectively, with sample characteristics 

A F ， W) 2 , O;) 2 , 〆 and x-, y\ (s；)\ r\ 

Accordingly, if in inequality (3) the parameters 只 { ， /^ ， 〆' ， /4, d ， 
and p<Ji<j 2 are replaced by the unbiased estimates 




n\s ^ 2 + n\s f y f + 作 ;) : 

n f + n ,f - 2 ’ n rt - 2 

nYsUy + n r/ r ff s ： s ；. 


the resulting expression in the left-hand member is frequently called 
Fisher's linear discriminant function. Since those parameters have been 
estimated, the distribution theory associated with aX + bY is not 
appropriate for Fisher’s function. However，if 〆 and n" are large, the 
distribution of aX + bY does provide an approximation. ^ 
Although we have considered only bivariate distributions in this 
section, the results can easily be extended to multivariate normal 
distributions after a study of Sections 4,10, 10.8, and 10.9. 


EXERCISES 

9.41. Let X { , ， ， ， ^20 be a random sample of size 20 from a distribution 

which is N(0, 5), Let L(6) represent the joint p*dX of Z 2 , X 2 ^ The 
problem is to test H 0 :8 = l against : 6 — 0. Thus n = {0: 0 = 0, 1}. 

(a) Show that L{ 1 )/L(0) <k is equivalent to x < c. 

(b) Find c so that the significance level is a — 0.05, Compute the power of 
this test if //, is true. 

(c) If the loss function is such that 1) — ^{0,0) — 0 and 

0) = if(0, 1) > 0, find the minimax test. Evaluate the power 
function of this test at the points 0 — I and 0^0, 
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(d) If，in addition, the prior probabilities of 0 = 1 and 0 = 0 are, 
respectively, A(l) = | and A(0) = ^ find the Bayes 1 test. Evaluate the 
power function of this test at the points 9 = 1 and 0 = 0. 

9 * 42 . Let X u 9 X l0 be a random sample of size 10 from a Poisson 

distribution with parameter 9, Let L(8) be the joint p.d.f. ofX x , JIT 2 , … ， JT( 0 . 

The problem is to test \ against 6 ^ L 

10 , 

(a) Show that L(|)/£^l) < k is equivalent to j; = x, > c. 

， * I , 

(b) In order to make a = 0,05, show that ff Q is rejected ify > 9 and ， ify = 9; 
reject H Q with probability \ (using some auxiliary random experiment), 

(c) If the loss function is such that = ^(1, 1) 0 and 义 (! ， I) = 1 

and ^(1, |) = 2 show that the minimax procedureis to reject//oify > 6 
and，if ^ 6, reject H Q with probability 0.08 (using some auxiliary 

random experiment). p 

(d) If, in addition, we are given that the prior probabilities of 0 = | and 
0 = 1 are h(j) - | and h(l) = respectively, show that the Bayes’ 
solution is to reject ff 0 if y > 5,2, that is, reject H 0 if y > 6. 

9 . 43 . In Example 3 let fi f \ — ^ = 1,4= 1, a {— 1 1 and p = 

(a) Evaluate inequality (3) when the prior probabilities are h(f/ h fjQ — j 

and W) = f and the losses are Sf[6 = (/z；, S = f4)] = 4 

and = fii)] - L 、 

(b) Find the distribution of the linear function aX + b Y that results from 
part (a). 

<c) Compute Pr (aX + bY<c; /i ； = ^ = 0) and Pr (aX +bY>c; 


^2 - 1 )* 

9.44. Let X and Y have the joint p.d.f. 




zero elsewhere, where 0 < 0 S ， 0 < 0 2 . An observation (x, y) arises from the 
joint distribution with parameters equal to either (0; = 1 ，咚 = 5) or ( 町 = 3, 
^2 — 2)* Determine the form of the classification rule. 

9.45, Let and Y have a joint bivariate normal distribution. An observation 
( X ，少） arises from the joint distribution with parameters equal to either 

== % = 0, (a\Y = (cr!)' = 1，〆 =! 


K - /i2 = I, (^T - 4, {a\T - 9, 


Show that the classification rule involves a second degree polynomial in x 
and y t 
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9,46. Let H … ，夂 be a random sample from a distribution with one 
of the two probability density functions (lip)Mx - d)/ P l 一 oo < 0 < oo 

p > 0, r - 1, 2. We wish to decide from which of these distributions the 
sample arose. We assign the respective prior probabilities p, an dtoY and 

+ ^=LIfthepnorp^ assigned to the 

巧 " 1 S p) ， the pOSterior Probability of fi is proportional to 

M (/ 机 ， … ， A)，where 






^ ej 

— 00 x , 


fi 


X\ ^ Q 
~ 


/■ 




P 


)g(e y P ) de dp. 


i= 1,2. 

M the losses associated with the two wrong decisions are equal, we would 
select the p.dX with the largest posterior probability. ’ 

⑻ If p) is a vague noninfonnative prior proportional to 1 //>， show that 

m 0 y 


’(Oi” * ■ ， ^ d) 







P 




p 


dd dp 




一 w) 1 •. /} ( 久 x„ — u) du d 又 


by changing variables through 8 - ^ p = IjX. Hajek and Sidak show 

that using this last expression, the Bayesian procedure of selecting f 
over /, if 

provides a most powerful location and scale invariant test of one model 
against another. 

(b) Evaluate l(/f\x u . _ , x n ), i = 1,2, given in (a) for /(x) = 1 
‘1 <jc< 1 3 zero elsewhere, and/ 3 (x) is the of AT(O f I). Show that 
the most powerful location and scale invariant test for selecting the 
normal distribution over the uniform is of the form ( Y n — Y^/S < k 

where Y l <Y 2 <-^<Y fJ are the order statistics and 5 is the sample 
standard deviation. ， 


ADDITIONAL EXERCISES 

I *' 

丨 - ^ J 

♦ « 

947. Consider a random sample A ， 為，…•，尤 from a distribution with 
p,d.L/(x; 8) — 0(1 — x) e ~\0 < x < 1, zero elsewhere, where 6>0, 

(a) Find the form of the uniformly most powerful test of// 0 : 0 = I against 

Hi: 6 > 1 , 

(b) What is the likelihood ratio X for testing H 0 :6^ l against 印 ：0 # 1? 

Ir 4 

9.48, Let H 2 ，…， Ue a random sample from a distribution with p.d.f. 
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f(x; $) — Ox 0 " \ 0 < x < l, zero elsewhere, 

(a) Find a complete sufficient statistic for 0. 

(b) If a = jS = 忐， find the sequential probability ratio test of H 0 :6 — 2 
against H } :0 — 3. 

9.49* Let X have a Poisson p,d.f. with parameter 0. We shall use a random 
sample of size n to test H 0 :0^ l against H X :B ^ L 

(a) Find the likelihood ratio X for making this_test, 

(b) Show that A can be expressed in terms of X y the mean of the sample, 
so that the test can be based upon X. 

* ■ 

9,50* Let X l9 X 2 ^^,X n and Y u Y 2 , , Y n be independent random 

samples from two normal distributions NQi u cf 2 、and Nip^ cr 2 )，respectively, 
where a 2 is the common but unknown variance. 

(a) Find the likelihood ratio X for testing : 以 =fi 2 = Q against all 
alternatives. 

(b) Rewrite X so that it is a function of a statistic Z which has a well-known 
distribution. 

(c) Give the distribution of Z under both null and alternative hypotheses. 

9,51. Let IV， • • ■ ，尤 denote a random sample from a gamma-type 
distribution with alpha equal to 2 and beta equal to 6. Let H 0 :$ = l and 
: 0 > L 

(a) Show that there exists a uniformly most powerful test for H 0 against 
//b determine the statistic Y upon which the test may be based, and 
indicate the nature of the best critical region. 

(b) Find the p.dX of the statistic Y in part (a). If we want a significance 
level of 0.05, write an equation which can be used to determine the 
critical region. Let K(B), 6 > I, be the power function of the test 
Express the power function as an integral* 

9J2. Let (X，，F!)，（A" 2 , F 2 ), …，（尤， D be a random sample from a 
bivariate normal distribution with 叫， p 2 , 4 = a】= o^，p = 去， where fi 2 , 
and cr 2 > 0 are unknown real numbers- Find the likelihood ratio X for testing 
//❹：川 —~ cr 2 unknown against all alternatives. The likelihood ratio 
i is a function of what statistic that has a well-known distribution? 

9.53. Let W = {W u W 2 ) be an observation from one of two bivariate normal 
distributions, 1 and II， each with 川 =" 2 = 0 but with the respective 
variance-covariance matrices 



How would you classify W into I or II? 

9,54. Let X be Poisson 0. Find the sequential probability ratio test for 
testing H 0 :6 = 0,05 against H^:0 — 0.03, Write this in the form 
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n 

c o( n ) < Z c i( n X determining c 0 («) and c } (n) when a a — OJO and 

/ =M 


fia — 0*05* 

9,55« Let X dud Y have the joint p*d_f. 

yi dtf O 2 ) — exp (— 

V 



y 


0 < x < ao $ 0 < y < oo, 


〒ero elsewhere, where 0 < 0 < 0 2 . An observation (x^ y) arises from the 

joint distribution with Q\ — 10, Q f 2 ― 5 or 0 r l ― 3, ffi — 2< Determine the 
form of the classification rule. 
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Inferences About 

+ » ■- 

Normal Models 


10.1 The Distributions of Certain Quadratic Forms 

♦ 

*. ■ 

A homogeneous polynomial of degree 2 in n variables is called 
a quadratic form in those variables. If both the variables and 
the coefficients are real，the form is called a real quadratic form. 
Only real quadratic forms will be considered in this book. To 
illustrate, the form + X^X 2 + ^ a quadratic form in the two 

variables and X 2 ; the form X] + Xj + Xj — 2X\X 7 is a quadratic 
form in the three variables and but the form 

(X { — l) 2 + (X 2 — 2) 2 = X] + Xj — 2X l — 4X 2 + 5 is not a quadratic 
form in X x and X 2 , although it is a quadratic form in the variables 

^1 and 尤 2 — 2, 

Let X and S 2 denote, respectively, the mean and the variance of a 
random sample X u X 2 ,..,, X n from an arbitrary distribution. Thus 

i(x t —j) 2 =t (V,— 义， … + ( 

I r \ n 
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~ — + A "； + ■ * • + Xl) 

-\{X { X 2 + … ■ + 局足 + . … + X n ^X n ) 

* > 

■I 

is a quadratic form in the n variables X 2 ,., ■ ， X n . If the sample 
arises from a distribution that is N{^ cr 2 ), we know that the 
random variable nS 2 ia 2 is ^{n — 1) regardless of the value of ft. This 
fact proved useful in our search for a confidence interval for a 2 when 
a is unknown. 

' * 

It has been seen that tests of certain statistical hypotheses require 
a statistic that is a quadratic form* For instance. Example 2, Section 

9.2, made use of the statistic which is a quadratic form in the 

I ， ： 1 

variables X u X 2 , • * *, Later in this chapter, tests of other statistical 
hypotheses will be investigated，and it will be seen that functions of 
statistics that are quadratic forms will be needed to carry out the tests 
in an expeditious manner. But first we shall make a study of the 
distribution of certain quadratic forms in normal and independent 
random variables. 

The following theorem will be proved in Section 10*9. 

« ^ * • * 

Theorem L Let Q -= Q { + Q 2 -h - + + Q k , where Q, Q u 

,,^ Q k are k + 1 random variables that are real quadratic forms in n 
independent random variables which are normally distributed with the 
means fh," ，， and the same variance cr 2 . Let Qjd 1 ， 
Q'l 。 2 , … ， Qk 一 have chi-square distributions with degrees of free¬ 
dom r ， q，■ • ■ ， , ， respectively. Let Q k be nonnegative. Then: 

4 

(a) Qi,,, *, Q k are independents and hence 

(b) Q k ja 2 has a chi-square distribution with r — (r } + - b r k _ } ) = r k 

degrees of freedom. 

• • 

Three examples illustrative of the theorem will follow. Each of 
these examples will deal with a distribution problem that is based 
on the remarks made in the subsequent paragraph, 

Let the random variable X have a distribution that is N(fi, cr 2 ). 
Let a and b denote positive integers greater than 1 and let 
n = ab. Consider a random sample of size n = ab from this normal 
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distribution. The observations of the random sample will be denoted 
by the symbols 


L ， 

2 5 

* ，尤 1/ ， 

* * ” ^ \b 

m 

« 

^^22 > * * 

.， 4 

* * ^ 5 ^ 2b 

_ 

_ 

i 

y * * * 

■ > 又 ij ， 

* • ” 

# 

又 dl ， ■- ， 

» ^ ajy 

• . • , ab * 


In this notation, the first subscript indicates the row, and the 
second subscript indicates the column in which the observation 
appears. Thus X u is in row / and column j\ i = 1 ， 2, .. ‘ ， 0 and 
/ = 1 ， 2,… ， 6, By assumption these n = ab random variables are 
independent, and each has the same normal distribution with 
mean 弘 and variance <r 2 . Thus, if we wish, we may consider each row 
as being a random sample of size 办 from the given distribution; and we 
may consider each column as being a random sample of size a from 
the given distribution. We now define a + b + 1 statistics. They are 


X 


/!〖 + .，• + x ib + ， •，+ X al + • ■ ■ + X Qb 

ab 


a b 

I E 馬 

I j=^ J 


ab 


f 


h 


+ + 


* * * 


a 


b 


b 


】， 2,…， 4 


and 



^\j + ^2j + * • • + Xj) 

a 


Z A 

i ^ 1 

a 



The statistic is the mean of the random sample of size n = ah; the 
statistics X u , are ， respectively, the means of the rows; 

and the statistics U! ， … ， X 上 are, respectively, the means of the 
columns. Three examples illustrative of the theorem will follow. 
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Example h Consider the variance S 2 of the random sample of size 矜 
We have the algebraic identity 


ab. 


abS 1 


G 


S H 之 ) 2 


SI 队广名 .) + ( 足 -()】 

1 = I y = * 


2 


a 






i= 1 


Q 


+ 2 Z S (A "。 一 X K ){X L — X ). 

The last term of the right-hand member of this identity may be written 


a 


2 1 




2 艺 [( 不 . 一 X m ){bX L — )]= 0, 


and the term 


b 


z n 足 •): 


may be written 


a 




Thus 


abS! 2 


Z I (^j -^) 2 + bi(X,-XJ 


or, for brevity ， 

Q ^ Q\ Qi* 

Clearly ， g ， 仏 ， and 仏 are quadratic forms in the n^ab variables X u . We shall 

use the theorem with k — 2\o show that Q v and Q 2 are independent • Since S 2 

is the variance of a random sample of size n — ab from the given normal 

distribution, then abS 2 ja 2 has a chi-square distribution with ab — 1 degrees of 
freedom. Now 


a, 


a 


a 


2 


z 




<r 


For each fixed value of i, ^ (X if — X L ) 2 /b is the variance of a random 
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sample of size b from the given normal distribution, and, accordingly, 

b — 

X {Xij - X L fja 1 has a chi-square distribution with b — 1 degrees of freedom. 

j— i 

Because the X tJ are independent, QJ(r 2 is the sum of a independent random 
variables, each having a chi-square distribution with 6 — 1 degrees of freedom. 
Hence Q\jo 2 has a chi-square distribution with a(b — I) degrees of freedom- 

a _ _ 

Now Q 2 — b Y, — ^ ) z S: 0. In accordance with the theorem, Q x and 

Q 2 are independent, and Q 2 /a 2 has a chi-square distribution with 
ab — l 一 a(b — 1) = a — 1 degrees of freedom. 

Example!, In abS 2 replace — X by (X^ — Xj) + (Xj — X ) to obtain 

abS^=t t - + (^j - t )】 2 ， 

j = \ / = i 


碱 -tl i x a - f ( 足广之 ) 2 , 


or，for brevity, 


Q = Qi + Qa 


It is easy to show (Exercise 10.1) that Qzja 2 has a chi-square distribution with 

b _ _ 

b(a — 1) degrees of freedom. Since Qa — a Y, (^j 一 d > 0, the theorem 

j — i - 

enables us to assert that and Q A 注 re independent and that Q a /(t 2 has a 
chi-square distribution with ab — l — b(a — ^ b — l degrees of freedom. 


Example 3. In abS 2 replace by {X L — X^) + (Xj — X ) + 

{Xij — Xf — X j + X ) to obtain (Exercise 10.2) 

..■ V 一 ，一 _ _ 

= (X L ^ xy + aj ： 

E j^ t 


+ z s «广名 • — L+o 2 . 


1/^1 


or, for brevity, 

Q = Ql + 04 + 05 , 


where Q 2 and Q 4 are as defined in Examples I and 2. From Examples I and 
2, Qja\ Q 2 /(T 2 ^ and Q^ja 2 have chi-square distributions with ab — 1 ， a — It ， and 
b — 1 degrees of freedom, respectively. Since Q s > 0, the theorem asserts that 
Q 2f Q 47 and Q s are independent and that QJo 1 has a chi-square 
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distribution with ab — 1 一 （a 〜 1) 一 （办一 j) = ⑷ —1)( 方 一 1) degrees of 
freedom* / 

Once these quadratic form statistics have been shown to be independent ， 
a muttiplicity of ^statistics can be defined. For instance, 

QAtWjb — 1)] — g 4 /(i> — 1) 

Q.i^bia - 1)] ^ Q,j[b(a - 1)] 

has an /^distribution with 6 — 1 and b{a — I) degrees of freedom; and 

J _ Q^SWjb - 1)] QJ(b - 1) 

Qs/i^ia ^ l)(h - l)]^ Q s /(a- 1)(6- i) 

has an ^distribution with ft — 1 and (a - l)(b - 1) degrees of freedom. In 
the subsequent sections it will be seen that some likelihood ratio tests of certain 
statistical hypotheses can be based on these F-statistics, 

EXERCISES 

10.1; I? Example 2 verify that Q ^ Q 3 + Q 4 and that 仏 /cr 2 has a chi-square 
distribution with b{a — I) degrees of freedom, 

10.2. In Example 3 verify that Q = Q 2 + Q 4 + Qs- 

10.3, Let X u X 2 , . f X n bc sl random sample from a normal distribution 

a 2 ). Show that 

i—1 i^2 ” 

. 

打 A ■_ 

where X - £ XJn and r - £ K/(n - 1 ). 

卜丨 2 

Hint: Replace X t -X by (X, - X f ) - (X { - X f )jn. Show that 

IT 

X ~ ^) 2 / <j 2 has a chi-square distribution with n — 2 degrees of 

2 

freedom. Prove that the two terms in the right-hand member are 
independent. What then is the distribution of 

— xy 

V ■ 

10.4* Let X iJk9 i = I ， … ， a; = i ， … ， A; 众 = 】，， " ， c，be a random sample 
of size n = abc from a normal distribution N(fi, <j 2 ). Let X = 

c b a 一 € ^ ' *** . 

S S Z X 桃 //? and 不. = 丈 X tjklbc: Show that 

!/ ■由 1 / 运 1 >= iy=i . 

t , ■ * : ， T * 

i + 如 z ( 足 , -u. 

/ = 11 j . … ( 
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Show that X I t (X iJk — has a chi-square distribution with 

f’ 1 jf I jjr J 

a{bc — I) degrees of freedom. Prove that the two terms in the right-hand 
member are independent. What ， then, is the distribution of 

be f (X t - X )V? Furthermore, let « £ £ X m fac and = 

\ k^U = i 

e 

E X ijk jc. Show that 
\ 

E Z Z ( 〜 - U 2 

i=s\j^[k—l 


Z Z S ~ 

r = I i sa 1 I: = 1 


+ & I d — KJ 2 + ^ t(Xj^xJ 


c I] E (為 •一不 _ I .， + 足 J 2 . 


Show that the four terms in the right-hand member, when divided by <r 2 , 
are independent chi-square variables with ab(c 一 1) ， 沒 一 1， 6 — 1， and 
(a — l)(b — 1) degrees of freedom, respectively. 

10.5. Let , X 2l> X 4 be a random sample of size w = 4 from the normal 
distribution iV(0, 1). Show that ^ (Xi — X) 2 equals 


(X t - X 2 f [X, - (JT, + X 2 )jlf [X 4 - (X { + X 2 + XM 2 

2^ + 3/2 4/3 

and argue that these three terms are independent, each with a chi-square 
distribution with 1 degree of freedom. 


10.2 A Test of the Equality of Several Means 

Consider b independent random variables that have normal 
distributions with unknown means 妁， p 2 , ■.. ， /4 ， respectively, and 
unknown but common variance a 2 . Let X [Jy X 2 ^ X Qj represent a 
random sample of size a from the normal distribution with mean fij 
and variance cr 2 , y = 1 ， 2” •. ， 6, It is desired to test the composite 
hypothesis H 0 : ^ fi 2 = m — fib = ^ M unspecified, against all 

possible alternative hypotheses H x . A likelihood ratio test will be used. 
Here the total parameter space is 

v ， (T 2 ): — oo < 巧 < oo，0 < a 2 < oo} 
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and 


⑴ = {(叫，叱， ■ • _ ， ^ 2 )- — oo < 川 =^ 




fi b — fi < co, 0 < a 2 < oo}. 


The likelihood functions, denoted by L(w) and L(Q) are，respectively, 

= ( 丄 r 


L{0>) = {2^) 6XP 


2(7 - 


E Z (\ - ") 2 


and 


L(Q) 




2na 2 ) 




exp 


2cr 


2 Z Z ^ ^/) 2 

I 1= I 




Now 


b a 

e\nL(<o) 石 ,? ,( 々 I) 


and 




d In L(oj) 


咖 2 ) 


nh 1 ^ ^ 


1 i 


If we equate these partial derivatives to zero, the solutions for ft and 
o 2 are, respectively, in w. 


b a 

E E x ij 

j = i i = i 

ab 


x , 
■ ■ 


Q 


I I (X U - xj 

f = t I = I 

ab 


2 


⑴ 


V, 


and these values maximize L{<o). Furthermore, 


a 


d In L(Q) ,5“ 广 " y) 


and 


% 


<r 


， 2 ,…， A 


d In L(Q) ab 1 ^ ^ 
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If we equate these partial derivatives to zero, the solutions for 
fh，fh， •…，烊 b ，and a 2 are, respectively, in Q y 


a 


S x ij 


a 




1 ， 2 ， • • • ， bn 


b a 


1 z (w 


( 2 ) 


ah 


w, 


and these values maximize i(Q). These maxima are, respectively. 




and 


ah 


2n I E ( x d.) 2 l 


r 


ab/2 


exp 


ab 


b 


2n Z L (〜一 又 J 2 

j— \ /= i 


ab/2 


e 


abjl 


^ E Z ( x i 厂又 

I I 


2 


2 z Z ( x f 又 J 


2 


她 


ab 


b 


271 E Z ( x o-^J 


t (— 


abjl 




Finally, 


X 


m、 

顽 


b 


tr 




Z Z (%—O 2 


b 


I Z 


abjl 


In the notation of Section 10,1 } the statistics defined by the 
unctions x m ： and v given by Equations ⑴ of this section are 


0 a 


ab 


and 


^=11-^^ Q 


• -_ ab ab’ 

while the statistics defined by the functions Uh …，足 A and 

given by Equations (2) in this section are, respectively, Xj = ^ X fJ /a, 


w 
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• . 4 "丨. do 一 

J = 1 ， 2,… ，方 ， and Q^jab — ^ X i^u ^ XJfjcib, Thus, in the 

J = I J I - - • 

notation of Section 10J， X 2fab defines the statistic Q 3 /Q. 

We reject the hypothesis H 0 if^< ^ To find i 0 so that we have 
a desired significance level oc, we must assume that the hypothesis H d 
is true- If the hypothesis Hq is true, the random variables constitute 
a random sample of size h = db from sl distribution that is normal with 
mean // and variance a 2 . This being the case, it was shown in Example 

2, Section 10J，that Q = Q S + Q 4y where = a J (f •广 X.) 2 ; that 

1 / * •曾 ' j 篇 j 

^3 and Q 4 are independent; and that Qijc 1 and QJa 1 have chi-square 
distributions with b(a — 1) and b — I degrees of freedom, respectively. 
Thus the statistic defined by l 2jab may be written 


Qi + Q 4 1 + QaIQi 

The significance level of the test of 丑 0 is 


where 


a — Pr 


qM " ^ lab; H ° 




Pr 


QJ(b-\) 

QsiiKa - I)] 


>c;H 0 



c U) d - 

But 

— 1)] 一 QJ(b — 1) 

Q^i[a 2 b{a - 1)] ^ Q^j[b{a - 1)] 

has an /^distribution with b — ] and b{a - 1 ) degrees of free¬ 
dom. Hence the test of the composite hypothesis H 0 : ^ = 
fx 2 = = h fi unspecified, against all possible alternatives may 

be based on an /^statistic. The constant c is so selected as to 
yield the desired value of a, 

* p 

-*_ - ■ : :' • • _ ■. ■ : ■ 、 ■ * 

Remark* It should be pointed out that a test of the equality of the b means 

j = l, 2, b ， does not require that we take a random sample of size a 
from each of the b nonnEl distributions. That is, the samples may be of 
different sizes，say a 2f *.. * A consideration of this procedure is left to 
Exercise 10 A 



















456 


Inferences About Normal Models (Ch. 10 


Suppose now that we wish to compute the power of the test of H 0 
against Hi when H 0 is false, that is, when we do not have 
川 == . • •= 炖 =It will be seen in Section 103 that when H x is 
true，no longer is Q^ja 1 a random variable that is % 2 (b — 1), Thus we 
cannot use an F-statistic to compute the power of the test when H l is 
true. This problem is discussed in Section 10,3, 

An observation should be made in connection with maximizing a 
likelihood function with respect to certain parameters. Sometimes it is 
easier to avoid the use of the calculus. For example, L(Q) of this section 

can be maximized with respect to 巧 ， for every fixed positive cr 2 , by 
minimizing 

z = I! Z (w/ 

j — i i ^ i 

with respect to fi Jf j — 1 ， 2,… ， Ik Now z can be written as 

^ = Z I [(^> - + (^/ - fXj )] 1 

j ~i /—i 

= E S (% 广 W + a Z d ~ A) 2 . 

j I / ― J j E= f 

Since each term in the right-hand member of the preceding equation 
is nonnegative, clearly z is a minimum，with respect to if we take 

^ j = 丨，2 , •. • ， 6 . 

EXERCISES 

10*6, Let X {j , Xy ， .…， X QjJ represent independent random samples of 
sizes aj from normal distributions with means 巧 and variances o * 2 ， 
j = 1 ， 2, ，，” 6. Show that 

Z S 一 U = I] Z — 心 ) 2 + I] ^ 足广 O 2 ， 

l i-l « I i^ l f 

醫 

or Q f = Q\ + Ql Here 之 =f f X,/1 aj and f If 

/ = i / & * /= I t 

川 = 此 == 队 ， show that Q'/a 2 and Q^/a 1 have chi-square 
distributions. Prove that and Q f 4 are independent, and hence Q\je 2 also 
has a chi-square distribution. If the likelihood ratio X is used to test 
H 0 : — fi 2 — — fi b i= ^ fi unspecified and a 2 unknown, against all 




Sec* 10.2) A Test of the Eifuulity of S€V€tul Af€&ns 


457 


Compute the ^-statistic that is used to test H 0 : fi x = /i 2 = 

10.10. Using the notation of this section, assume that the means satisfy th 
condition that /^ = + (^ — \)d — fi 2 — d = pLy — d ^ ^ fx b — d. Tha 

is, the last b — 1 means are equal but differ from the first mean 川 ， providec 
that d ^ 0. Let independent random samples of size a be taken from the l 
normal distributions with common unknown variance a 2 . 

(a) Show that the maximum likelihood estimators of fi and d are (i = X 
and 


h 


I LKb—iH 


possible alternatives, show that ^ < A 0 is equivalent to the computed F> 
where 

(b — l)Q^ 

What is the distribution of F when is true? 

10_7. Consider the T'-statistic that was derived through a likelihood ratio 
fbr testing the equality of the means of two normal distributions 
having common variance in Example 2 in Section 9J. Show that P is 
exactly the /'-statistic of Exercise 10.6 with a! = « ， a 2 = m, and b = 2. 
Of course ， X Xy — ^X n ,X are replaced with Jf" ， … ， X u and 
, Y mf Y by * -,» ^ 2 .* 

亂 8- In Exercise 10.6, show that the linear functions X i； - Xj and Xj - 
are uncorrelated. * 

Hint: Recall the definitions ofX^and and, without loss of generality, 

we can let £( 馬 ） = 0 for a!U / ■ “ 

10‘9， The following are observations associated with independent random 
samples from three normal distributions having equal variances and 
respective means 爿 ! ，芦 2 , 


II 


o 


19 4 2 1 

' * ■-_ V 

5 12 4 4 


2,3,CJ2.H 


.5J.0.8 

n li n 
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(b) Using Exercise 10.3, find 仏 and Q 7 = c3 2 so that, when </= 0, Qij<J 2 

is x 2 (l) and - 

■* 警 

t f ( 心一 fj 2 =e 3 +a6+&. 

j= I y=I 

(c) Argue that the three terms in the right-hand member of part (b), once 
divided by tr 2 , are independent random variables with chi-square 
distributions, provided that d—0. 

(d) The ratio Qr/iQs + Qe) times what constant has an F-distribution, 
provided that = 0? Note that this F is really the square of the 
two-sample T used to test the equality of the mean of the first 
distribution and the common mean of the other distributions, in which 
the last h — 1 samples are combined into one* 


103 Noncentra 胤 z 2 and Noncentral F 

Let X U X 2 ^ \. ,X n denote independent random variables that are 
N(ji h a 2 )^ i = 1 ， 2,… ， /I， and let F = ^ Xj/a 1 - If each fi ( is zero, we 

• . i 

know that Y is x 2 ( n )* We shall now investigate the distribution of Y 
when each 从 is not zero. The m.g,f of Y is given by 


M{t) = E 


exp 1 r IS) 


n 


n 五 


exp 



Consider 


E 


exp 



i*O0 


G^Jln 


exp 


Ixf (X t - fif 


a 4 


2a 2 


dx r 


The integral exists if / |. To evaluate the integral, note that 

.， - m 

tx] (x t — fii) 2 xf(\ — 2/) i 2 叫 Xi fif 

， - ， 3 2<r 2 2a 2 




It 




a 2 (\ - It) 2a 2 






2 


2t 
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Accordingly, with t < \ , we have 


u ㈣ 

=exp 

「 W 1 


f*00 J 

L p U 2 Al 

_^(1 - 2t)_ 




x exp 


I 


L 


2cr 2 


x ； 




T 


I ~2t 


dx r 


If we multiply the integrand by ^/l — 2/, t <^ f wc have the integral 
of a normal pAS. with mean fi“(l - 2t) and variance a 2 j(\ - It). Thus 


E 


exp 



^J\ — 2/ 


exp 


a 2 (l -2t) 


and the m.gX of Y=^ Af/cr 2 is given by 








r^ ex P 


( 1 - 2 /) 


^(1 - 2 /) 


< 


A random variable that has an m,gX of the functional form 


M{t) - 


(1 — Itf 2 


e 


te/o - 2i) 


t 


where f < 士， 0 < 0， and r is a positive integer，is said to have a 
noncentral chi-square distribution with r degrees of freedom and 
rioncentrality parameter 0, If one sets the noncentrality parameter 
0 = 0, one has M(t) : (I — 2ty r/2 , which is the of a random 
variable that is f{r). Such a random variable can appropriately be 
called a central chi-square variable. We shall use the symbol ^ 2 (r, Q) to 
denote a noncentral chi-square distribution that has the parameters r 
and 0; and we shall say that a random variable is x 2 (r ， 0) when that 
random variable has this kind of distribution. The symbol z 2 (r, 0) is 

equivalent to x 2 {r). Thus our random variable X^a 1 of this 


n 


section is x 2 [ ^ Z W/ 疗 2 If each //； is equal to zero, then Y is x 2 {n^ 0) 


or，more simply, Y is x 2 (^)- 

The noncentral chi-square variables in which we have interest are 
certain quadratic forms, in normally distributed variables, divided by 
a variance a 1 . In our example it is worth noting that the noncentrality 
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parameter of Elf/cr 2 ，which is ^ may be computed by 


replacing each X t in the quadratic form by its mean fi h i = 1 ， 2, •. ， ， w. 
This is no fortuitous circumstance; any quadratic form Q = 
Q{X U . •. ， X n ) in normally distributed variables, which is such that 
Q/^ 2 is %\r y &), has 0 = ， " 2 ,. • •， " rt )/ 汀 2 ; and if Qja 2 is a chi-square 

variable (central or noncentral) for certain real values of 此 ， /x 2 , •…， 
it is chi-square (central or noncentral) for all real values of these means. 

It should be pointed out that Theorem I, Section I0J, is valid 
whether the random variables are central or noncentral chi-square 
variables. 

We next discuss a noncentral invariable. If U and V are in* 
dependent and are, respectively, and x 2 {r 2 ), the random variable 
F has been defined by r 2 Ujr x V. Now suppose, in particular, that 
f/is jf 2 (r ， ， 0)，Kis ^ 2 (r 2 ), and that U and Fare independent. The random 
variable r 2 U/r } V is called a noncentral F-variabie with r x and r 3 degrees 
of freedom and with noncentrality parameter 0. Note that the 
noncentrality parameter of Fis precisely the noncentrality parameter 
of the random variable U, which is x 2 ( r \ ^ 

Tables of noncentral chi-square and noncentral Fare available in 
the literature. However, like those of noncentral ^ they are too bulky 
to be put in this book. 

EXERCISES 

10.IL Let Y h i= 1 ， 2, denote independent random variables that 

n 

are ， respectively, x 2 (n, f = 1 ， 2” • • ， /n Prove that Z = 冗 y ； is 



1(K12* Compute the mean and the variance of a random variable that is 


X 2 (n 0) 


10.13* Compute the mean of a random variable that has a noncentral 
F-distribution with degrees of freedom r t and r 2 > 2 and noncentrality 
parameter 9. 

10.14. Show that the square of a noncentral T random variable is a non¬ 
central F random variable. 








Sec. 10*4| Multiple CompaHsa 


461 


10,15* Let X l and X 2 be two independent random variables. Let X t and 

X { + X 2 bc f{r u x\r, 0) f respectively. Here r, < r and < 8. 

Show that X 2 is x\r -r u 0 - 6 X ). 


10.16* In Exercise 10.6, if 内，此 ，“ ■，叫 are not equal, what are the 
distributions of Qija 2 , 0/〆， and FI 


10*4 Multiple Comparisons 


Consider b independent random variables that have normal 
distributions with unknown means respectively, and 

with unknown but common variance a 2 . Let k' ， k 2i … ， k b represent 
b known real constants that are not all zero. We want to find a 

b • . 

confidence interval for a linear function of the means 

Mi ， 只 2 ,… ，/V To do this, we take a random sample X Jjf , X 2j , …， 
of size a from the distribution N{fi p a 2 ), j = 1 ， 2, .， ，， & If we denote 

a 

X x ij! a by then we know that Xj is N(^ a^/a), that 

l=I ^ 

a 

x ( 馬一 ^jf/o 2 is 尤 2 (fl - 1)， and that the two random variables are 

1= I 

independent. Since the independent random samples are taken from 

_ a 

the b distributions, the 2b random variables ^ (X tJ — X ) 2 /cr 2 , 
卜 1 ， 2, … ，办 ， are independent- Moreover，H_ 5 X h and 




I z 


(Xj~ Xjf 
a 2 



are independent and the latter is x 2 [b{a — 1 )]. Let Z = £ k s X . Then 

. ^ X 1 ' 

Z is normal with mean J k f pLj and variance j \a 2 ja, and Z is 

independent of 





7T Z Z i^fj ^ ^y) 2 * 

* / j = l i = l 
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Hence the random variable 


S - X kjfij 


T 



? ^ fl a . I kjX, - £ kjfij 


z 兮 V" 


has = /-distribution with b(a - 1) degrees of freedom A positive 


hi 


c 



? ^ ^ ^ - Z + c f[lk) 


V 

a 


The experimental values of ^,y= 1,2,..., 6, and F will provide a 

100(1 - a) percent confidence interval for j] kjfij. 

T j 

It should be observed that the confidence interval for £ 

fhaTwe m a 0 v *h C choic e 叫， 々 2 ，…， ‘ It is conceivable' 

' y e ^ nterested 111 more than one linear function of 

从，此 ’ • -. ， /4, such as fh ~ fh ， m 厂 (m! + 吣 )/2, or + 叫 ， We 

can，■ of course, find for each ^kjfij a random interval that has a 

preassigned probability of including that particular |> 爲 . But how 

-ZrTi C0 7^ e ' he P rcb ^ility that simultaneously these random 
fo HnCar funCtions of > 心 …， ^The 

soL g to P = 咖 C ° mpanSOnS ' dUe t0 触祗 is 0ne 

T^e random variable 

: i(^j~ 


…' ..v ■ . a 2 /a 

ls X 2 (b) and, because it is a function of X 
independent of the random variable 


alone，it 


is 


V 


b 


b(a 
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Hence the random variable 


b 


am Hjfih 


F 


V 


=V^ tribUt i 0n D Wi i h “ nd b(a - 1} de 8 rees of freedom. From 

dsnoZ^T{F<i=°i^ToT V3lUeS ° fa， WCCan find 3 咖 _ 


— 

X (^y Mj) 2 ^ bd 


V 

a 


ou 


Note that 客 ■( 足广岭 )2 j s t h e square of the distance, in 6-dimen- 

smnal space, fr°ni the point ( 凡，此 ”. • ， /O to the random point 

. 1 ， . 2 , … ，十 ）‘ Consider a space of dimension b and let 

An Pnn ； 4) d f ote J he coordinates of a point in that space. 
An equation of a hyperplane that passes + through the point 
， " 2 , • • • ， /4) is given by 

灸 ’ (’■ — A!) + k 2 (t 2 — 只 2 ) + . . ■ + ^(4 - fi fy ) 0 ? (1) 

Th7la 0t aU " eal . numbers = 1 ， 2 ,…， ^ are equal to zero. 
^ s^uan the distance from this hyperplane to the point 
^ 1 ~ ^ Js h — X 2 ^ … • ， 4 = D is 

四 ( 尤 - 内 ） + k 政 2 — 处 ） + … + k b {X h Uh )f 

^ + k\ -f» * * kf . (2) 

From the geometry of the situation it follows that X (JP y - Mj ) 2 is equal 

to the maximum of expression (2) with respect to k x 為 ‘ Thus 

the inequality J {X^ - ^) 2 < {bd){Vja) holds if and only if 


T, — M'j) 


2 


b 

I 兮 


a 


( 3 ) 


inequality (3) may be written in the form % 





b 


bd 



V 

a 
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Thus the probability is I — a that simultaneously, for all real 
k' ， k 2 , …， not all zero. 


h 


I 


bd(i ki )f < ikjiij< ikjx.j + Jbd(i k])^ . (4) 


Denote by A the event where inequality (4) is true for all real 
k u … ， k b , and denote by B the event where that inequality is true for 
a finite number of 6-tuples (&，••-， k h ). ff the event A occurs, certainly 
the event 5 occurs. Hence P(A) < P(B). In the applications，one is often 


b 


interested only in a finite number of linear functions ^ kj/ij. Once 

i , 

the experimental values are available, we obtain from (4) a confidence 
interval for each of these linear functions. Since P(B) > P(A) = 1 — a, 
we have a confidence coefficient of at least 100(1 — a) percent that the 
linear functions are in these respective confidence intervals. 


Remarks. If the sample sizes, say a u a 2 , 
(4) becomes 


» 響 * 


a b , are unequal，inequality 


h 





bdi^ V < +Jbd[ 


kj 


K ， 


(4 ，） 


where 


X ； 


i 

£h 


i f - 


V 


* 


h 


and d is selected from Table V with b and Y, ( a j ~ ^) degrees of freedom, 

I 

Inequality (4') reduces to inequality (4) when a' = a 2 = … =a b . 

h 

Moreover Jf we restrict our attention to linear functions of the form [ kjfij 
with 2 ^ ^ = 0 (such linear functions are called contrasts 、、the radical in 

寧 

inequality (4') is replaced by 


d(b — I)^ 


where d is now found in Table V with 6 — 1 and ^ — I) degrees of freedom* 

] 

In these multiple comparisons, one often finds that the length of a 
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confidence interval is much greater than the length of a 100(1 - a) percent 

confidence interval for a particular linear function ^ kjft jt But this is to be 

expected because in one case the probability I ^ a applies to just one event ， 
and in the other it applies to the simultaneous occurrence of many events. 
One reasonable way to reduce the length of these intervals is to take a larger 
value of a，say 0.25, instead of 0.05* After all, it is still a very strong statement 
to say that the probability is 0.75 that all these events occur. 


EXERCISES 

KUZ It A u A 2f .,^ are events, prove, by induction, Boole’s inequality 

P (為 u A 2 u ■ ^ u A k ) < ^ P(Af), Then show that 

! 

p ( A T 门為 * rv … ■ n O > i — Z p( Ai y 

• m jf - s I 

10.18. In the notation of this section，let H …， M ，/ = 1, 2,…，出， 

represent a finite number of A-tuples. The problem is to find simultaneous 
confidence intervals for Y, i = 1 ， 2” " ，出 ， by a method different 

j— * 

» 

from that of Scheifc. Define the random variable by 

(J. MC 广石 ^ )/ 我 %、命 /a ’ / = 1 ， 2 ,…， m. 

(a) Let the event Af be given by - c, < 7 ： < /= 1, 2,… ， m. Find the 

random variables t/, and W ( such that U, < f < W, is equivalent 
to Af. 7 … 

(b) Select c { such that P{Af) = 1 ^ a/m; that is, P{A^ ajm. Use the 
results of Exercise I0J7 to determine a lower bound on the probability 
that simultaneously the random intervals (t/ h %),•■.， (U mt W m ) 

include Y, 女 i / 巧 ， .■ ， Z respectively. 

(c) Let a = 3, 6 ; 6, and a = 0-05. Consider the linear functions 妁 _ 吣， 

- ^ - M4, ^4 - (Ms + ^)/2, and + 处 + ， • - + 叫 )/6. Here 

m = 5. Show that the lengths of the confidence intervals given by the 
results of part (b) are shorter than the corresponding ones given by 
the method of Scheffe, as described in the text. If m becomes 

sufficiently large ， however, this is not the case ， 

^ *■ £ 
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10-5 The Analysis of Variance 

The problem considered in Section 10.2 is an example of a method 
of statistical inference called the analysis of variance. This method 
derives its name from the fact that the quadratic form abS 2 、 which is 
a total sum of squares，is resolved into several component parts ； In this 
section other problems in the analysis of variance will be investigated • 
LctXij, / = 1, 2,. •., a and j— 1 ， 2, • ，， ， 6, denote n^ab random 
variables that are independent and have normal distributions with 
common variance tr 2 . The means of these normal distributions are 

a , t 

fiij = + a ； + where £oc f = 0 and = For example, take 

, ■ i , i ^ ‘ 

a — 2^ b = 3, = 5, ％ = 1 ， ot 2 = — i, = 1 ，办 2 = 0, and — _ 

Then the ab = six random variables have means 


► m 

/Ih = 7 ， 

"12 = 6， 

只 13 = 5 ， 

1 f 

芦 21 = 5 ， 

^22 = 

/^23 ^ 3* 

Had we taken 

=P2 = l " 

- 0, the six random variables would have 

had means 





/in = 6 ， 

"12 = 6， 

0 

川 3 = 6， 


只 21 = 4, 

#22 = 4 ， 

芦 23 = 4. 

Thus, if we wish to test the composite hypothesis that 

4 

- 

V^w " 

- "12 — - 


A 

"21 ~ 

m 

* 

4 

= fhb ， 


« m 

^ 

— 1 1 ^ • * i ■ 

一 Pal — 

= 


we could say that we are testing the composite hypothesis that 
择 ' == ，， • = (and hence each pj = 0, since their sum is zero). On 
the other hand，the composite hypothesis 


"u = 

= 只 21 == • 

… =All 

只 12 : 

■i i 

• 

• 

= 只 22 口 • 

醫 r 

* 

^\b = 

^ fhb = • 

n hb 


is the same as the composite hypothesis that a, — a 2 — * ^ — 0 
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Remarks. The model just described, and others similar to it, are widely 
used in statistical applications. Consider a situation in which it is desirable to 
investigate the effects of two factors, that influence an outcome. Thus the 
variety of a grain and the type of fertilizer used influence the yield; or the 
teacher and the size of a class may influence the score on a standard test. Let 
x u denote the yield from the use of variety i of 往 grain and type; of fertilizer. 
A test of the hypothesis that 見 = 月 2 = … = 凡 = o would then be a test of 
the hypothesis that the mean yield of each variety of grain is the same 
' regardless of the type of fertilizer used. 

There is no loss of generality in assuming that ^ a,= 全氏 = 0. To see this, 

let fi'J- a ； + P；, JVrite a f =_£ a；/a and f ^jb. We have = 
(〆 + a' + + (os ； — W) + (pj — 歹 ’）= 芦 + % + 巧 ， where E oc^= Z fi 产 0- 

To construct a test of the composite hypothesis H 0 :^ t ^ 
泠 2 = • ， • = ^ = 0 against all alternative hypotheses, we could obtain 
the corresponding likelihood ratio. However, to gain more insight 
into such a test, let us reconsider the likelihood ratio test of 
Section 10.2, namely that of the equality of the means ofb distributions. 
There the important quadratic forms are Q, Q 3 , and Q 4j which are 
related through the equation Q — Q 4 + Q 3 . That is, 

ab^= it - XJ 2 XjY; 

" = 1 1 j = I 

so we see that the total sum of squares, abS^ 2 ^ is decomposed into a sum 
of squares, 0 4 , among column means and a sum of squares ， g 3 , within 
columns. The latter sum of squares，divided by w = ab, is the m.l.e of 

灯 1 ， provided that the parameters are in Q; and we denote it by Of 
course, S 2 is the mJ^of^ 2 under co, here denoted by oj. So the 
likelihood ratio 2 = (4 / <) ㈤ /2 is a monotone function of the statistic 

r QJib - l) 

Qij[b{a — 1)] 

upon which the test of the equality of means is based. 

To help find a test for H 0 ： fix — 0» where /i /; = 

/i + a, + 岛 ， return to the decomposition of Example 3, Section 10.1, 
namely Q^Q Z + Q 4 ^- Q s . That is ， 

ab^ = ± i(x L -xy + i iix^xy 

j 达 i I i 

i= I J =I 

r 9 I ■* 
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thus the total sum of squares, abS 1 ^ is decomposed into that among 
rows (Q 2 X that among columns ( 仏)， and that remaining (g 5 ) - It is 
interesting to observe that — Qs! a b is the m*Le, of a 2 under Q and 

< 3 - (04 + Q$) 4^ ^ iXi 广又 hf 

— ab — Di — ^ — 

is that estimator under co. A useful monotone function of the likeli¬ 
hood ratio X — (o^/^)^ /2 is 

QJ(b - 1) 

U(aHi )】， 

which has，under H Q9 an F-distribution with b — \ and (a — 1)( 办 —1) 
degrees of freedom* The hypothesis //© is rejected if c, where 
a = Pr (i 7 > c; Hq). 

If we are to compute the power function of the test，we need 
the distribution of F when Hq is not true. From Section 103 we 
know，when H x is true, that Q 4 I 0 2 and Qs/o 2 are independent (central 
or noncentral) chi-square variables* We shall compute the non- 
centrality parameters of_Q 4 ia 2 and when is true. We have 
E{Xij) = pi+ a f + jSy, E{X im ) = /i + a h E(Xj) - fi + ft and E{X^) — fi. 
Accordingly, the noncentrality parameter of 仏 /o 2 is 

a X (M + 广 M ) 2 a Pj 
j =1 J 

a 2 a 2 

and that of Qsfo 2 is 

b a 

S 1 (M + a i + 1 _ 衫 一 ％ _ M 一 + 林 ) 2 

5^=1 A 

-- - --= u, 

cr 2 

Thus, if the hypothesis H 0 is not true, F has a noncentral F-distribution 
with 6 — 1 and (a — l)(b — I ) degrees of freedom and noncentrality 

parameter at ^/a 2 . The desired probabilities can then be found in 

tables of the noncentral /^distribution, 

A similar argument can be used to construct the F needed to test 
the equality of row means; that is, this 尸 is essentially the ratio of the 
sum of squares among rows and Q s . In particular, this Fis defined by 

〆 Qili a — V) 

= Qsfii^ - m -~oi 
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and, under H 0 : a t — a 2 — - * - — — 0 > has an /^distribution with 

a — 1 and (a — 1)(6 — 1) degrees of freedom. 

The analysis-of-variance problem that has just been discussed is 
usually referred to as 这 two-way classification with one observation per 
cell Each combination of i and j determines a cell; thus there is a total 
of ab cells in this model. Let us now investigate another two-way 
classification problem, but in this case we take c > 1 independent 
observations per cell* 

Let = 1 ， 2, ，* • ， a ，j = 1, 2, _ . P and k — 1 ， 2, c ， 

denote n = abc random variables which are independent and which 
have normal distributions with common, but unknown, variance cr 2 . 
The mean of each X ijk , k = 1 ， 2, •.. ， c，is 叫 =p + ot r + 汉 + %， where 

^ b a 

Z a，= 0 ， 2 岛 = 0， E = 0， and J] 為 = 0. For example, take 

i= I /=1 I / = 1 


a = 2,b^ 

3 ? ii — 5, a x 

=1, oc 2 = — i，& = 

= 1 ，於 2 

= 0, A = — 1 ， y，，= 1 ， 

in = 1 ， 

= 一 2, 如 = 

z 一 1,722 = — 1， and y 2 3 = 

2. Then the means are 

si：. 

f^n z 

= 8， "〖2 = 7 ， 

f^ll — 3, 


f^2l = 

: 4 ， fi 22 = 3, 

= 

= 5. 

Note that ， 

if each y"= 

= 0, then 




hi = 1 ， Mn = 6^ 

"13 = 5, 


^21 = 

= 5 ， = 4 ， 

只 23 = 3, 


That is, if = 0, each of the means in the first row is 2 greater than 
the corresponding mean in the second row. In general, if each y i} ^ 0, 
the means of row i { differ from the corresponding means of row i 2 by 
a constant. This constant may be different for different choices of/, and 
/ 2 ■ A similar statement can be made about the means of columns j\ and 
h ‘ The parameter y i} is called the interaction associated with cell (/,/), 
That is, the interaction between the ith level of one classification and 
the/th level of the other classification is One interesting hypothesis 
to test is that each interaction is equal to zero. This will now be 
investigated. 

From Exercise 10*4 of Section 10.1 we have that 

I i I ( 〜一足 ") 2 -bciix^-xy + ac f (x Jt - x ; y 

I =* 1 y = ] k^\ f = J j^\ 

+ ct i ( 為 .- 无 d, + f 」 2 

i= l j= l 

+11 i ov - 為 .) 2 ; • 

/ =i > ^ I ^ ^ j 
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that is，the total sum of squares is decomposed into that due to row 
differences，that due to column differences，that due to interaction ，and 
that within cells. The test of 

■^o * Tij =0 ， / = 1 ， 2， • " ， a ， : j ^ 1 ， 2， •，. * ^ by 

against aU possible alternatives is based upon an Fwith (a — I)(i — 1) 
and ah(c — 1) degrees of freedom, 


F = 


The reader should verify that the noncentrality parameter of this 

_ b Q 

i^distribution is equal to c ^ Thus F is central when 

j = \ i = t ■ 

- * t ■ ♦ 费， 

f = 1，2， * * * ， a，^ /二 1 ，2，， . ■ ，办， is true* 

EXERCISES 

10.19. Show that 

i 

I t iX^X L f^ t t (A — 足一 [y+l ) 2 + ald — 足 J 2 ‘ 

y-1 1 i,-w=i j= i 

10.20. If at least one # 0, show that the which is used to test that each 
interaction is f equal to zero, has noncentrality parameter equal to 

cii yyo\ 

j =I / * i 

f 1 i ■ _ a * 9 ■* j , 4 

% t « - ' * * , 

10*21* Using the background of the . two-way classification with one 
observation per ce!l,_showJhat the maximiini likelihood ^estimators of a h 
and fi are d f = X h - U 严又•广 X mf and = f respectively. 
Show that these are unbiased estimators of their respective parameters 
and compute var (d,), var ( 成 )， and var (fi). 

a 

ii ^ ^ * • 费 ’ _ 

10*22* Prove, using the assumptions of this section, that the linear functions 
馬 一 t + (• and — are lincorrelated* 

• 1 

10.23. Given the following observations associated .with a two-way 
classification with a = 3 and A = 4, compute the /^statistics used to test 


—■ — 

^ /-I _ 

[(a-l)(b-\)] 

v «i 

r 

V 1 

— I -— 

S S E (4 — 馬 -) 2 

r ' - ' 

[ab(c- 1» 
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the equality of the column means (& =- 办 2 = 於 3 = 凡 = 0) and the equality 
of the row means (o^ — oc 2 = a 3 = 0), respectively- 


Row/Column 

I 

2 

3 

4 

I 

3.1 

42 

T 2J 

4:9 

2 

2.7 

2.9 

1.8 

3.0 

3 

40 

4.6 

3.0 

3.9 


10.24. With the background of the two-way classification with c > I 

observations per cell, show that the maximum likelihood estimators 
of the_parameters are a, = - f fid ， y^=X ijm - 

X tmt — X tjm + Z and (x — X Show that these are unbiased estimators of 
the respective parameters. Compute the variance of each estimator. 

— * •» * - d i 

¥ 

10.25. Given the following observations in a two-way classification with 
a — 3, b — 4, and c = 2， compute the F-statistics used to test that all 
interactions are equal to zero = 0), all column means are equal (0j — 0), 
and all row means are equal = 0), respectively. 


Row/Column 12 3 4 


10.6 A Regression Problem 

_ «' B ' ^ • 

There is often interest in the relation between two variables, for 
example, a student’s scholastic aptitude test score in mathematics and 
this same student’s grade in calculus ： Frequently, one of these 
variables, say x, is known in advance of the other, and hence there is 
interest in predicting a future random variable Y. Since Kis a random 
variable, we cannot predict its future observed valife Y—y with 
certainty. Thus let us first concentrate on the problem of estimating 
the mean of Y, that is, E(Y), Now E{Y) is usually a function 
of x; for example, in our illustration with the calculus grade, 


5 


4.4.3.3,3.4. 


.7.2.84.o.5 
2.3.#*42.3.rsi 

# 

2 9 9 3 6 0 
4.42.2,4*5. 


,9.7.9 


o 4 

+ ■- 

4 4 


2 


3 
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say F，we would expect E( Y) to increase with increasing mathematics 
aptitude score x. Sometimes E{Y) = /i(x) is assumed to be of a given 
form, such as linear or quadratic or exponential; that is, /i(x) could be 
assumed to be equal to a + fJx or ol + fix + yx 1 or ae 0x . To estimate 
E(Y) = or equivalently the parameters a ， 办 ， and y, we observe the 

random variable Y for each of n possibly different values of x, say 
々 ，…，； which are not all equal. Once the n independent 
experiments have been performed, we have n pairs of known numbers 
(x, s y t X (x 2 , y 2 ), • " ， (x 打， y n ). These pairs are then used to estimate the 
mean E(Y). Problems like this are often classified under regression 
because £(10 = K x ) ls frequently called a regression curve. 

Remark* A model for the mean like at + 办 x + yx 1 , is called a linear model 
because it is linear in the parameters, a, 0, and y. Thus ae px is not a linear model 
because it is not linear in a and fi. Note that，in Sections 10 J to 10.4, all the 
means were linear in the parameters and hence linear models. 

^ 、 .• , I , 

Let us begin with the case in which E(Y) — fi(x) is a linear function. 
The n points are (x } , y x ), ■… ， (x n7 y n ); so the first problem 

is that of fitting a straight line to the set of points (see Figure 10J )、 
In addition to assuming that the mean of F is a linear function, 
we assume that ， Y u ^ are independent normal variables 

with respective means a + 办 (jc, — 3r), / — 1 ， 2,… ， /I， and unknown 
variance a 2 , where x — S x f fiu Their joint p.d.f. is therefore the 

V 1 <x.. vA 



FIGURE 10.1 
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product of the individual probability density functions; that is, the 
likelihood function equals 

[Vi — a - 齡一刃 f 


L(a, p, <t 2 ) = Yl 


… y/2n(T : 

n!2 


exp 


2cr 2 


it 


2na 2 


exp 


Z l >/ 一 a - 跑广研 


if 


_ _ ■ , 

To maximize L(a f p, a 2 ), or, equivalently, to minimize 


[yi — a — p(Xi — x)f 

-b L(a, jg, o- 2 ) = (lito 2 ) + — --- ， 

I 2a 4 

we must select a and ^ to minimize 

〜 /f(oc ， /f) = X iyi -cc- p(Xi - x)] 2 , 

1=1 

Since I y ，— a~ 於 (々 ~x)\ = ly,- — is the vertical distance from the 
point (x h y f ) to the line y = ^(x), we note that H(ol, p) represents the 
sum of the squares of those distances. Thus selecting a and ^ so that 
the sum of the squares is minimized means that we are fitting the 
straight line to the data by the method of least squares. 

To minimize H(a y fi), we find the two first partial derivatives 

dH(a f 8) « 

^ = 2 [y t — ct — — x)] (— 1) 

and 

dH(a, - 

~^ 2 Z ~ a ~ ^) 11 -(^ - ^)1 

Setting dH(a ， p)/da — 0, we obtain 

n n 

Y. yi- n ^- H (^/- x) = o. 

f= \ 

Since 


we have that 


n 


E ( x i — 习二 0, 


n 


I 


1 S > H r Y — A . 

y t — ntx = 0 


■編 > L i V 


^ 11 _ 

d=f. 




and thus 
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The equation = 0 yields, with a replaced by y. 


n 


n 


E (y r -J)(^/ - x) ~ fi Z ( x f - W = ^ 


or，equivalently. 


n 


ft 


E d — Y)(x t — x) [ 一 x) 


n 


n 


(X t - xf 


E (Xi- x) 


2 


To find the maximum likelihood estimator of a 2 , consider the partial 
derivative 


n 


铒一 h L(a, ^ <r 2 )] 


Z Lk _ a — 芦 (X — 又 )] 


2 


n 


i i 


d((T 2 ) 2a 2 2 (( 7 2 ) 2 

Setting this equal to zero and replacing a and J3 by their solutions dc 
and obtain ■* : ■ 

Oci Li\h 




l = I 


Of C? 两？ # 巧物 (朗叫 — 1 涔， 〆 =A 

Since a is a linear function of independent and normally distributed 
random variables^a has a%CFrmal distribution with mean 


n 


n 


聊 =% Z K 


n . 


E E(Yd 




n 


^ [a + P(Xi - x)] - a. 




and variance 


•.j 


fi 


var ⑹ =[ 



’ 豆…亡 T. 


varCF,) 


n 


Hi 




The estimator ^ is also a linear function of Y u Y 2 , 
has a normal distribution with mean 


Y n and hence 




¥ 


_ 


z ( 化 禪 (r f ) 


n 




% 


j ( 




B : V ； 
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£ (Xi - x)[^ + - x)] 




2 


n 


n 


a I + (x t - X) 2 


£ (x/ - x) 2 


and variance 


var 


0)=i 


Xi — x 


n 


X - x) 2 


2 


var ( Yf) 


n 


E (々 一又) 


2 


£ (x t - x) 


2 


2 


(J 2 


a 2 


I (x^ 3c) 2 


It can be shown (Exercise 10.27) that 


2 




or, for brevity ， 


+ [F/ — a —爲 — 对 1} ‘ 

— -oc) 2 + (0 - fif £ (x/ - x) 2 + n?. 

i= 1 


Q = Qi + Qi + S3 * 


Here Q, Q u Q 2 , and 0 3 are real quadratic forms in the variables 


y.^ a ~ p(x f — x). 


h2. 


n. 


In this equation, Q represents the sum of the squares of n independent 
random variables that have normal distributions with means zero and 

* 9 pi* I ^ , 

variances a 2 . Thus Q/a 2 has a chi-square distribution with n degrees 
of freedom. Each of the random variables y/n(d — a)/<r and 

j . _■ 

V (x f — x) 2 0 — J?)/cr has a normal distribution with zero mean 

and unit variance; thus each of Q\ju 2 and Q 2 I^ 2 has a chi-square 
distribution with 1 degree of freedom. Since is nonnegative, we 
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have，in accordance with the theorem of Section 10」， that Q\,Q 2 , and 
Qi are independent^ so that Qi/a 2 has a chi-square distribution with 

n— \ — \ =n — 2 degrees of freedom. Then each of the random 
variables 


T x 


a — a 


[y/n(& ~ a)]ja — _ 

^/Q,l[a\n - 2 )) " ^/¥/(n ^ 2 ) 


and 


r 2 


X — x ) 2 0 _ 择 ) 


a 




^/Q,l[a 2 (n - 2 )] 




n 


( n ~ 2) Y, (x f — x) 2 


— 


has a /-distribution with n — 2 degrees of freedom. These facts 
enable us to obtain confidence intervals fora and ^ The fact that nd 2 ja 2 
has a chi-square distribution with n — 2 degrees of freedom provides 
a means of determining a confidence interval for a 1 . These are some 
of the statistical inferences about the parameters to which reference was 
made in the introductory remarks of this section ， 


Remark. The more discerning reader should quite properly question 
our constructions of T, and T 2 immediately above. We know that the squares 
of the linear forms are independent of Q 2 = na 2 y but we do not know, at this 
time, that the linear forms themselves enjoy this independence. This problem 
arises again in Section 10,7. In Exercise 10.47, a more general problem is 
proposed, of which the present case is a special instance. 


EXERCISES 

m 儀 

10*26. Students 1 scores on the mathematics portion of the ACT examination, 
and on the final examination in first-semester calculus (200 points 
possible), are given. 

(a) Calculate the least squares regression line for these data, 

(b) Plot the points and the least squares regression line on the same graph. 

(c) Find point estimates for a, 芦 ， and a 1 . 

(d) Find 95 percent confidence intervals for ot and 0 under the usual 
assumptions* 
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10.27, Show that 


£ [Yf - a - = n(&-ixf + 0-pf 1 £ (x t - ^) : 




+ Z [Y t — d — — x)] 2 , 

^ i 

■i ， 

10.28. Let the independent random variables F】，r 2 ， "”K n have, 
respectively, the probability density functions N{fix h y 2 xf) y f = 1， 2” * ‘, n, 
where the given numbers x A are not all equal and no one is zero. 

Find the maximum likelihood estimators of j8 and y 2 . 

10*29. Let the independent random variables Y t ,..., Y n have the joint pAS. 


L(a, §, a 2 ) = (4) exp 


一 (X — — x )] 2 


where the given numbers x if x 2 ，… . ， are not all equal. Let /f 0 : - 0 (a 
and a 2 unspecified). It is desired to use a likelihood ratio test to test H 0 
against all possible alternatives. Find A and see whether the test can be based 
on a familiar statistic. 

Hint: In the notation of this section show that 

X 叆 ) 2 1 03 + 炉 S (a - 习气 


10.30* Using the notation of Section 10二 assume that the means 外 satisfy 
a linear function of 上 namely fij — c + d[j — (b+ 1)/2]- Let independent 
random samples of size a be taken from the b normal distributions with 
common unknown variance a 2 . , 

—_ f .. r r . i i ' / , • t 

(a) Show that the maximum likelihood estimators of c and d are, 
respectively, c — and 


b 




3 


I [y—(6 + i)/2](k) 
r= 1 

I [j-(b+l)(2f 


/ 


X 


8 - 3 1 ] 4 8 8 
4 46216 


20252628253 130 




8 4 4 2 00 2 o 3 

38018398 


X 


2520262628282932 
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(b) Show that 

a ti 


b 


-U 祖 J /= jy=H 


Xu — X” — <?r/ 


h + 



:: ，令 ( 卜 ¥). 

■. ， *■ * '**：:• 

(C) ^ rg ri h w 1 th f tW ° terras in tfae 喻卜 hand member of part (b\ once 
divided by ^ arc independent random variables with chi-square 
distributions provided that d - 0. 

(d) What Statistic would be used to test the equality of the means, that 
is, ijq ; a = 0? 

10.7 A Test of Independence 

LetXand Fhave a bivariate normal distribution with means ^ and 

^ positlve variances^ and <7^ and correlation coefficient p/We wish- 
to test the hypothesis that Xand Fare independent. Because twojointlv 
flerma^y —tribtol mndmn variables are independent if and only 

^ P rfn We A te rl^ hyP ° theSiS 札 :” 0 卿 inst hypothesis 

^ ^ A likdihood ratio test will be used) Let (X u K,), 

[ 2 , f)，- • 、， (X n ,Y n ) denotea random sample of size n> 2 from the 
! var^te normal distribution; that is, the joint p.d.f. of these 2/i random 
variables is given by 

r • t 1 1 ■ fc « *r 

卜 . , ' _ :. ，■…- 

/( 文 I ， 少 I )/(%，少2) ” •/(‘ 凡)， 

i^though it is fairly iiifficiilt to shoW, the st^ti^tid that is defined by the 
likelihood ratio ^ is a function of the statistic r ^ 

: low 为 ov- 巧:： 


' / : J ^ ： -' : r 1 W . 鐵二 i :’ ^ ■* ■ *? I : 

/名⑶ -野 I ; 韻 L 2 




； o 


This statistic R is called the correlation ’秘娜说拟督加 random 

S TS 1 % 7 hC likelihwd 油 6 御 _^ ， 善既 aik the ruction 

° o 1 ^ is equivalent to the computed value of \R\ > c. That 

is，if the absolute value of the correlation coefficient of the sample 

? rge : we reject the hypothesis that the correlation coefficient 
o the distribution is equal to zero. To determine a value of c for 

a satisfactory significance level, it will be necessary to obtain the 
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distribution of R, or a function of R 3 when H 0 is true. This will now 
be done, 、 - … 卜 

Let X { = x lf X 2 = x 2 , .. s f X n — x n ,n > 2, where 々 ， jc 2 , • ■ . ， and 

n n * ^ 

x — Y, x i! n are fixed numbers such that [ (x t — x) 2 > 0. Consider the 

conditional p_d,f. of Y u K 2 , …， Y ny given that X { = X\r = 
x l9 …， X n = ‘ Because 7 ir Y t> ..., Y n are independent and, with 

p — 0, are also independent of Jf., X 2y … . ， X n , this conditional p-dX 

t ^ _ * * 1 ^ * • ■ 

is given by 

「 E ( 乃 — A): 



exp 


2<4 

:' -r ； 

Let R c be the correlation coefficient，given = x\, X 2 
- * - > ^ so tiiat . , * 




Rc i m-w i (x^xjiYr-f) £ 柄“聊 


I — 


J; 



n 


IT 


n 


E (Xi- x) 


2 


2 ( a —评 


： E ^ ^) 2 


is like ^ of Section 10-6 and has mean zero when p = 0, Thus, referriixg 
to T 2 of Section 10,6, we see that 







[ {r， - f - [R c ^/^(r) — xff— x) 2 ](x f — x)} 


2 


ft — h 2 


{n _ 2)2(^ - x) 2 ; … (I) 

■ 、 * . ■ ■ " . rv : - j ^ c.- / /r ' ■ 4 

has, given 不翁 〜， .. ，， f = jc„, a cpnditippal fTdistribution with 
n —2 degrees ,of freedom. Note that the say ^(0, of this 

/-distribution does not depend upon jq ， x 2 ,. • , Now the joint 

p.df. dtX u X 2 ,.^ X n "and R^/n- 2/^/ i - where ' 


rt 









* \ % 

is the product of g(t) and the joint p-dX of X U X 2 ^ -. -, X n . 

Integration on x, ， x 2 , ：. ^ yields the marginal p.d.f. of 
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g(r) 


寧[(卜 2)/2] 


(1 一 々卜_， 


x t 


RyJ n — 2/^/l — R 2 ; because 茗⑺ does not depend upon x } ,x 2 ,„ 
it is obvious that this marginal p.df. is g(t\ the conditional p.dX of 

尺 — Vy/1 — R 2 C . The change-of-variable technique can now be 
used to find the p.dX of 

Remarks* Since R has，when p = 0, a conditional distribution that does 
not depend upon x 2 , *.. ^ x n (and hence that conditional distribution is, in 
fact, the marginal distribution of R) 7 we have the remarkable fact that R is 
independent of X u X ly t It follows that R is independent of every 
function of X\ , X 2i .,., X n alone, that is, a function that does not depend upon 
any Y h In like manner, R is independent of every function of Y u r 2 ,.. •, 匕 
alone. Moreover, a careful review of the argument reveals that nowhere did 
we use the fact that has a normal marginal distribution. Thus, if X and Y 
are independent, and if Y has a normal distribution, then R has the same 
conditional distribution whatever be the distribution of X, subject to the 


n 

condition ^ — x) 2 > 0, Moreover, if Pr 


t (不 一奸 >0 


1 5 then R 


has the same marginal distribution whatever be the distribution of X. 

If we write Ry/n — 2/^/1 — R 2 , where T has a /-distribution 
with n ~ 2 > 0 degrees of freedom, it is easy to show，by the 
change-of-variable technique (Exercise 10.34)，that the p,dX of 及 is 
given by 


— 0 elsewhere. 

We have now solved the problem of the distribution of Ji, 
wh en p = 0 and n > 2 ， or，perhaps more conveniently, that of 

n — 2/^/1 - i? 2 . The likelihood ratio test of the hypothesis 
: /? = 0 against all alternatives H x :p ^ 0 may be based either on the 

statistic R or on the statistic R^Jn - 2/^/T - R 2 = T, although the 
latter is easier to use. In either case the significance level of the test is 

t t 4 

a = ?r(\R\ > c t ; H 0 ) = Pr(\T\ > c 2 ； 

where the constants and c 2 are chosen so as to give the desired value 
of a. 

Remark. It is also possible to obtain an approximate test of size a by using 
the fact that 

^ tq id) 
orr>r-r ：： ^!n" 



(2 


1 

V 

r 

V 
I 
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has an approximate normal distribution with mean | In [(I + p)/(l — p)】and 
variance \j{n — 3), We accept this statement without proof. Thus a test of 
p = 0 can be based on the statistic 


Z 


in [(1 + R)/(i -/?)]-! In [(1 -h p)/(l - p)] 

y/l/(n ― 3) 


with p = 0 so that [ In [(I + p)/(l — p)] = 0, However，using W, we can 
also test hypotheses like p ^ p Q against H x \ p ^ p Qi where p 0 is not 
necessarily zero. In that case the hypothesized mean of W is 


2 


in 


Po 


Po, 


EXERCISES 


10.31, Show that 


n 


一 F) 




R 







10,32. A random sample of size n=^6 from a bivariate normal distribution 
yields a value of the correlation coefficient of 0.89, Would we accept or 
reject, at the 5 percent signficance level, the hypothesis that p = 0? 

lfl.33* Verify Equation (1) of this section. 

10*34. Verify the p dX (2) of this section. * 


10*8 The Distributions of Certain Quadratic Forms 

h 應 ■ 

.1 p __ * 

Remark. It is essential that the reader have the background of the 
multivariate normal distribution as given in Section 4,10 to understand 
Sections 10.8 and 10.9. 

•■I 

Let X h i — 1 ， 2,… ， n，denote independent random variables 
which are N(ti h of), i = 1 ， 2 ,…，/?， respectively. Then 

Q ^ Z - "/)» x 2 (ft). Now ^ is a quadratic form in the — 从 

1 - " - 

and Q is seen to be, apart from the coefficient the random variable 
which is defined by the exponent on the number e in the joint 
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p ； d,£ of X\\ X^ v, X n , We shall now show that this result can be 
generalized. 

Let 不，不 ， • • ■ ， 足 have a multivariate normal distribution with 

—,(X — fO’Vlx — |i) 


exp 


2 


where, as usual，the covariance matrix V is positive definite. We shall 
show that the random variable Q (a quadratic form in the X ( — 叫)， 
which is defined by (x — — ji)，is x 2 («). We have for the 

m.gX M(t) of Q the integral 


r*oo 


r i 

— oo 


一① (2 兀 r /2 


x exp 


t(x — _ 抖） 一 * 


(x-iiyy~ [ (x -n) 


2 


办 i … dx n 


f*(X> 


rpo 


oo 


4 / 


x exp 


(2 冗 rVi^i 

• ■ 

(x-jiyv-^x -(1)(1 


2t) 


2 


私 … dx n 


With V" 1 positive definite, the integral is seen to exist for all real values 
of t < l r Moreover, (1 — 20V -1 , t<\, is a positive definite matrix 
and, since |(1 - = (1 - 201V^ 1 |, it follows that 

(x- l t)T- l (x Tl i)(l-20 


(2nr 2 ^/\vm - 2ty 


exp 


2 


■* * * - 

can be treated as a multivariate normal p.d-f. If we multiply our 

r " - ’ L a *■.； 4 M ■ K- ’ - - 孱 

integrand by (1 — 2ty n , we have this multivariate p.dX Thus the 
of)Q\h given by - 


M(t) 


<i 


(1 一 2tf t2 

% k 

and Q is / 2 (n)，as we wished to show. This fact is the basis of the 
chi-square tests that were discussed in Chapter 6. 

The remarkable fact that the random variable which is defined by 
(x — fi)’V 一 L (x — n) is x 2 («) stimulates a number of questions about 
quadratic forms in normally distributed variables. We would like to 
treat this problem in complete generality, but limitations of space 
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forbid this, and we find it necessary to restrict ourselves to some special 
cases. 

Let X lf X 2 , •“， X n denote a random sample of size n from a 
distribution which is N{%a\ a 2 > 0. Let X、 [H … ，尤 ] and let 
A denote an arbitrary n x n real symmetric matrix. We shall investigate 
the distribution of the quadratic form X'AX. For instance, we know 

• f - a - ^ t . i I : • : I \ J L « ^ f ,r ' ^ f ； t 4 

that X%X/a 2 = X'X/a 2 = Xffa 2 is %\n). First we shall find the 

m.g.f‘of X'AX/cr 2 * Then we shall investigate the conditions that must 
be imposed upon the real symmetric matrix A if X'AX/ff 2 is to have a 
chi-square distribution- This m.g-f is given by 


00 


MO 


产 00 


j -* 1 n 


(^k) 


exp 


/x，Ax 

a 2 2a 1 


dx\ … dx n 


x^(I — 2/A)x 
2a 1 


dx x … dx, 


where I — 1^, The matrix I — 2tA is positive definite if we take \t\ 
sufficiently small, say |/| < h y h > 0. Moreover, we can treat 


' .1 v - fc 1 

x’(l 一 2/A)x 


2a 2 


as a multivariate normal p.dX Now |(I - 2/A)—Vf 1 ’ 2 = (t 7|I — 2/A| I/2 . 
If we multiply our integrand by |I — 2/AJ l/2 , we have this multivariate 
pU_ Hence the of X'AX/a 2 is given by 

' j ^ - 2tA\^ lf \ \ \t\ <h. (I) 

It proves useful to express this m.g.f in a different form. To do this, 
let a h a 2f . ： denote the characteristic numbers of A and let L 
denote ah h x n orthogonal matrix such that L'AL = 

diag [ct \, 口 2 ， . •. ， a」'Thus 


1/(1 一 2/A)L- 


一 2 叫 
0 


0 

1 - lta 2 


0 

0 


0 


0 


2ta 


n 


Then 


ft (1 t 2/^) = |I/(J — 2/A)LJ = II r- 2tA\ 
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Accordingly, we can write M{t\ as given in Equation (!)»in the form 




n 


n(i— 2 叫 ) 


一 m 


ki < a 


( 2 ) 


Let r, 0 < r < «, denote the rank of the real symmetric matrix A. 
Then exactly r of the real numbers a 2 ^ ., ； s a n , say 沒 “ •… ， a” are 
not zero and exactly n — r of these numbers, say +! ， ••• ，仏， are zero* 
Thus we can write the m*g.f, of X'AX/er 2 as 

M{t) = [(1 - 2ta } ){\ - 2ta 2 ) • • • (1 - 2ta r )]~ {l \ 

* 

Now that we have found, in suitable form, the m.g.f of our random 
variable, let us turn to the question of the conditions that must be 
imposed if X'AX/tr 2 is to have a chi-square distribution. Assume that 
X'AX/tr 2 is x 2 (^l Then 

M(t) = [(1 - 2 坤 )(1 — 2ta 2 ) " ， （1 — 2ta r )Y Xfl = {\ - 2t)- kj \ 
or ， equivalently, 

(1 - 2m,)(1 — lta 2 ) •… （1 _ 2 ta r ) = (1 - 2 //， \t\ < 

Because the positive integers r and k are the degrees of these 
polynomials, and because these polynomials are equal for infinitely 
many values of /, we have k = r, the rank of A, Moreover, the 
uniqueness of the factorization of a polynomial implies that 
% = a 2 = * ■-= 洱 = 1 • If each of the nonzero characteristic numbers 
of a real symmetric matrix is one, the matrix is idempotent，that is, 
A 2 — A, and conversely (see Exercise 10,38). Accordingly, if X'AX/a 2 
has a chi-square distribution, then A 2 = A and the random variable is 
jf 2 (r )， where r is the rank of A. Conversely, if A is of rank r, 0 < r S «， 
and if A 2 = A, then A has exactly r characteristic numbers that are 
equal to one, and the remaining n — r characteristic numbers are equal 
to zero. Thus the m.g.f. of X'AX/a 1 is given by (! — 20 r/2 , t < and 
X'AX/a 2 is f(r). This establishes the following theorem. 

Theorem 2. Let Q denote a random variable which is a quadratic 
form in the observations of a random sample of size nfrom a distribution 
which is 7V(0, a 2 ). Let A denote the symmetric matrix of Q and let r, 
0<r<n, denote the rank of A. Then Qju 2 is % 2 {r) if and only if A 2 - A. 

Remark, If the normal distribution in Theorem 2 is N(^ a 2 ), the condition 
A 2 = A remains a necessary and sufficient condition that Qja 2 have a 
chi-square distribution. In general, however, Qjd 1 is not x 2 ( r ) but, instead， 
Qja 2 has a noncentral chi*scjuare distribution if A 2 — A. The number 
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of degrees of freedom is r，the rank of A ? and the noncentrality parameter is 
where |i' = ^ , /i]. Since jt'Aji = fx 1 Yu a o 、where A — [a^ 

then, if/i #0, the conditions A 2 = A and = 0 are necessary and sufficient 

conditions that Qja 1 be central x 2 (，). Moreover, the theorem may be extended 
to a quadratic form in random variables which have a multivariate normal 
distribution with positive definite covariance matrix V; here the necessary and 
sufficient condition that Q have a chi-square distribution is AVA = A. 

EXERCISES . 


10.35* Let Q ^ — where X { , X 2y X A is a random sample of size 

4 from a distribution which is iV(0, a 2 ). Show that Qja 2 does not have a 
chi-square distribution* Find the m.gX of Q/a 2 . 

10-36, Let X" — [X ]y X 2 ] be bivariate normal with matrix of means 
pf = [fi^ // 2 ] and positive definite covariance matrix V. Let 


Qi 




— 2p 




x\ 


crf(l _ 〆 ） qcr 2 (l —〆） ^(1 -p 2 ) 


Show that Q { is y^{r, 9) and find r and 8. When and only when does Q x have 
a central chi-square distribution? 

10*37* Let X' = [X u X 2i X 3 ] denote a random sample of size 3 from a 
distribution that is A^(4, 8) and let 


A 



Justify the assertion that X'AX/a 2 is 义 2 (2, 6). 

10.38, Let A be a real symmetric matrix. Prove that each of the nonzero 
characteristic numbers of A is equal to 1 if and only if A 2 = A. 

Hint: Let L be an orthogonal matrix such that L AL = 
diag [a u a 2 ,. * •, a„] and note that A is idempotent if and only if L AL is 
idempotent, 

10.39. The sum of the elements on the principal diagonal of a square matrix 
A is called the trace of A and is denoted by tr A. 

(a) If B is n x m and C is m x /?， prove that tr (BC) — tr (CB)* 

(b) If A is a square matrix and if L is an orthogonal matrix, use the result 
of part (a) to show that tr (L'AL) = tr A. 

(c) If A is a real symmetric idempotent matrix，use the result of part (b) 
to prove that the rank of A is equal to tr A- 
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10*40. Let A = [a tj ] be a real symmetric matrix. Prove that ^ ^ afj is equal to 

■ ■ ■ / j i . - \. 

the sum of the squares of the characteristic numbers of A. 

Him ： If L is an orthogonal matrix，show that X! Z = 
tr (A 2 )-Jr (I/A 2 L) = ti: [(I/AL)(I/AL)】. ” 

10,41, Let X and S 2 denote ， respectively, the mean and the variance of a 

a b r ■ 

random sample of size n from a distribution which is 7V(0 S a 2 ). 

(a) If A denotes the symmetric matrix of nX 2 ^ show that A = (i/n)P s where 
P is the n x n matrix, each of whose elements is equal to one. _ 

(b) Demonstrate that A is idempotent and that the tr A = 】 ■ Thus nX 2 ja 2 

is . 

(c) Show that the symmetric matrix B of ttS 2 is I — (1/«)P., 

(d) Demonstrate that B is idempotent and that tr B = w — i. Thus nS 2 ja 2 
is ^(n — IX as previously proved otherwise. 

(e) Show that the product matrix AB is the zero matrix* 


10.9 The Independence of Certain Quadratic Forms 


We have previously investigated the independence of linear 
functions of normally distributed variables (see Exercise 4.132)* In this 
section we shall prove some theorems about the independence of 
quadratic forms. As we remarked on p. 483, we shall confine our 
attention to normally distributed variables that constitute a random 
sample of size n from a distribution that is N(Q, a 2 ). 

Let X u X 2f *., ，尤 denote a random sample of size n from a 
distribution which is iV(0, a 2 ). Let A and B denote two real symmetric 
matrices, each of order n. Let X' = [H … ， X n ] and consider the 
two quadratic forms X’AX and X'BX. We wish to show that these 

a -m n , 

quadratic forms are independent if and only if AB = 0, the zero matrix. 
We shall first compute the m.g.f. / 2 ) of X'AX/cr 2 and X’BX/cr 2 . 
We have 

M{t u t 2 )= 

/ hx’Ax t 2 x'Bx x'x\ f 





/ 


exp 


x'(I - 2/, A- 2t 2 B)x 


2<t 


2 


dxi 














Sec* 10,91 The Independence of Certain Quadratic Forms 


487 


The matrix l — 2t { A — 2t 2 B is positive definite if we take |/J and |/ 2 j 
sufficiently small, say |^| < h u jr 2 | < h 2 , where h u h 2 > 0. Then, as on 

• T 4 M m * %, 

p, 483, we have 

■ - 卜 - _ 令 -* * ^ 

雖 ， y = - 2 aA — \t 2 \ < K 

•:辱 

Let us assume that X'AX/cr 2 and X'BX/or 2 are independent (so that 
likewise are X'AX and X'BX) and prove that AB = 0. Thus we assume 
that 

• ■ A » 唪 》 ¥ 

M(t u t 2 ) = M(U,0)M(0.t 2 ) (I) 

for all t { and / 2 for which \ti\ < h h i = I, 2, Identity (!) is equivalent to 
the identity . , 

• ■ P a •w * 

|I — 2f| A — 2? 2 B| — |I — 2/|A||I — 2/! 利， | 心 | < h h i = 1 ? 2. (2) 
Let r > 0 denote the rank of A and let ^ y denote the r 


nonzero characteristic numbers of A, There exists an orthogonal 

.■: ’， • ' - g 

matrix L such that 



*■ < ki 

for a suitable ordering of a 2 ^.. -, a r . Then L r BL may be written 
in the identically partitioned form 

4 「 I ， 

VBL= ?U =D. 

L®21 \ ®22_ 

The identity (2) may be written as 

|L1|I - 2f_ A — 2l z B||L| = |L1|I - 2/, A]|LHL1|I - 2f 2 B||L|, (2') 

or as 

|I - 2/jC- 2r 2 Dl -jl- 2qC||I — 2t 2 Dl (3) 

The coefficient of ( — 2t x ) r in the right-hand member of Equation (3) is 
seen by inspection to be a 】 a 2 • * \^JI — 2/ 2 D[. It is not so easy to find 
the coefficient of { — 2t y ) r in the left-hand member of Equation (3)- 
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Conceive of expanding this determinant in terms of minors of order r 
formed from the first r columns. One term in this expansion is the 
product of the minor of order r in the upper left-hand corner, namely, 
|I r — 2/jC n — 2r 2 Dii|, and the minor of order n — r in the lower 
right-hand corner, namely, \l n „ r — 2/ 2 D22|. Moreover, this product is 
the only term in the expansion of the determinant that involves 
(一 26)、Thus the coefficient of ( —2 /jX in the left-hand member of 
Equation (3) is … a r \l n _ , — 2/ 2 D22i- If we equate these coefficients 
of we have，for all t 2 , \t 2 \ < h 2 , 

..|I-2/ 2 D|-|I n _ r ~ 2/ 2 D 22 |. (4) 

Equation (4) implies that the nonzero characteristic numbers of the 
matrices D and D 22 are the same (see Exercise 10,49). Recall that the 
sum of the squares of the characteristic numbers of a symmetric matrix 
is equal to the sum of the squares of the elements of that matrix (see 
Exercise 10.40). Thus the sum of the squares of the elements of matrix 
D is equal to the sum of the squares of the elements of D 22 * Since the 
elements of the matrix D are real, it follows that each of the elements 
of D Uj D l2 , and D 21 is zero. Accordingly, we can write D in the form 

^ 「0 ; 0 

D = L BL = — I -- 

Lo: d 22 」 

Thus CD — L'ALL'BL = 0 and L'ABL — 0 and AB = 0, as we wished 
to prove. 

To complete the proof of the theorem, we assume that AB = 0. We 
are to show that X'AX/cr 2 and X’BX/〆 are independent We have, for 
all real values of t x and hi 

(I - A)(I 一 2?2®) = I ^ 2/| A — 2 / 2 B ， 
since AB — 0. Thus 

|I - 2t v A - 2t 2 B\ ， |I - 2r,A||I - 2/ 2 B|. 

Since the m.g.f of X'AX/a 2 and X'BX/a 2 is given by 

M{t u h) = |I — A - 2/ 2 B 「 i/2 ， \t,\ < h h i= 1,2, 

we have 

h) = 0)M(0, f 2 ), 

and the proof of the following theorem is complete. 
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Theorem 3. Let Q t and Q: denote random variables which are 
quadratic forms in the observations of a random sample of size n from 
a distribution which is N(0^ a 2 ). Let A and B denote, respectively，the real 
symmetric matrices of Q [ and Q 2 * The random variables and Q 2 are 
independent if and only if AB = 0, 

Remark. Theorem 3 remains valid if the random sample is from a 
distribution which is N(p, a 2 )，whatever be the real value of /i. Moreover, 
Theorem 2 may be extended to quadratic forms in random variables that have 
a joint multivariate normal distribution with a positive definite covariance 
matrix V. The necessary and sufficient condition for the independence 
of two such quadratic forms with symmetric matrices A and B then 
becomes AVB — 0. In our Theorem 2, we have V = cr 2 I, so that 
AVB - A(t 2 IB - <t 7 AB = 0. 

We shall next prove Theorem 1 that was stated in Section 10 J. 

Theorem 4, Let Q^Q\ + - +Q k -i + Q k , where Q, 
Q\^ ^ Qk~\^Qk are k + 1 random variables that are quadratic forms 
in the observations of a random sample of size n from a distribution which 
is N(0, a 2 ). Let Qj<r 2 be x 2 (r\ let Q t /<r 2 be x 2 (n), l, and 

let Q k be nonnegative. Then the random variables Q u Q 2 ^ • • • ， Q k are 
independent and, hence ， Qfjo 2 is y 1 {r k = r _ ^ — • - ， 一 q — ■)_ 

Proof. Take first the case of k = 2 and let the real symmetric 
matrices of Q u and Q 2 be denoted, respectively, by A, A], A 2 . We 
are given that Q = - Q 2 or ? equivalently, that A = A, + A 2 . We are 

also given that Q/a 2 is j}{r) and that QJcr 2 is x\r } ). In accordance with 
Theorem 2, p. 484, we have A 2 = A and A; = A” Since Q 2 > 0, each 
of the matrices A, A u and A 2 is positive semidefinite. Because A 2 = A 5 
we can find an orthogonal matrix L such that 

n 1 01 

L AL = 二 j—. 

Lo ,oj 

If then we multiply both members of A = A r + A 2 on the left by L" and 
on the right by L, we have 

「I 1 01 

，一卜 - L A(L + L A 2 L- 

Lo,oJ 

Now each of A, and A 2 , and hence each of L'A, L and LA 2 L is positive 
semidefinite. Recall that，if a real symmetric matrix is positive 
semidefinite, each element on the principal diagonal is positive or 
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zero* Moreover, if an element on the principal diagonal is zero, then 
ail elements in that row and ait elements in that column are zero. Thus 
L AL ^ L 7 A t L + L A 2 L can be written as 

a : on 

** w 

o ! o 

>i ■ ip 



Since A* = A,, we have 

(VA.L ) 2 = VA { h - 



If we multiply both members of Equation (5) on the left by the matrix 


L'AjL, we see that 

. Tg, ； 0 




or, equivalently ， L'A,L — L'A t L + (I/ADfi/AsL), Thus (L^AjL) x 
(l/A 2 L) = 0 and A,A 2 - 0. In accordance with Theorem 3, Q } and Q 2 
are independent. This independence immediately implies that Qi!^ 1 is 
y 2 (r 2 = r — T|), This completes the proof when k — 2. For k>2 y the 
proof may be made by induction. We shall merely indicate how this 

can be done by using k = 3 - Take A = + A 2 + A 3 , where A 2 — A, 

A? = At, ,Aj z= A 2? and A 3 is positive semidefinite. Write 
A - A, + (A 2 + A 3 )= 〜十 B u say* Now A 2 = A, A] = A u and B! is 
positive semidefinite. In accordance with the case of A: = 2, we have 
A,B, = 0, so that Bf = 6 ,. With B, - A 2 + A 3 , where, = 

= A 2 , it follows from the case of A: — 2 that A 2 A 3 = 0 and = A 3 . 

If we regroup by writing A = A 2 + (A t + A 3 ), we obtain Aj A 3 = 0, and 

so on, ■ 


Remark. In our statement of Theorem 4 we took X\ 9 X 2 , -. •, 尤 to be 
observations of a random sample from a distribution which is N(0 y cr 2 ). Wc 
did this because our proof of Theorem 3 was restricted to that case. In fact, 
if Q\ 0 are quadratic forms in any normal variables (including 

multivariate normal variables), if ⑵ + …， + G，if 0 ，…， 0 -! 
are central or noncentraJ chi-square, and if is nonnegative, then 仏 ， • -. ， G 
are independent and is either central or noncentral chi-square. 

This section will conclude with a proof of a frequently quoted 

… : ■ , … ^ < 

theorem due to Cochran- 
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Theorem 5 - Let X u X 2j .. denote a random sample from a 
distribution which is, iV(0, a 2 ). Let the sum of the squares of these 

■ 響 . + 獻 — " T ■ f* B 

observations be written in the form 

n ■ 

E = 2l + 02 + •， * + fit ， 

i \ \ 

where Q ; is a quadratic form in X 2 ^ .… ， A" w , with matrix Aj which 
has rank r^j =1,2,,..,/:. The random variables Q iy Q 2 ,.. are 

T . * k 

independent and Qj/o 2 is x 2 ( r j)^ J = hX …人 if and only if r y = n. 

k n * k 

Proof* First assume the two conditions Yj r J~ n an< ^ X = X Qi 

i ii 

to be satisfied. The latter equation implies that I = A] + 

A 2 + … + A 々 .Let B, = I — A 卜 That is, B f is the sum of the matrices 
A|,. … A k exclusive of A,，Let R f denote the rank of B,-+ Since the rank 
of the sum of several matrices is less than or equal to the sum of the 

‘ k : 』 

ranks, we have R t <J^rj — r f = n — r h However, I — A f + B,, so that 

， i 

n < r f + R ( and n — r t < R t . Hence R, — n — r,. The characteristic 

numbers of B/ are the roots of the equation |B/ — 2l\ — 0, Since 
B ； == I — A，，this equation can be written as |I — A, — JU| = 0. Thus we 
have j A, — (1 — >l)Ij = 0 . But each root of the last equation is one minus 
a characteristic number of A h Since has exactly n — JR t = r f 
characteristic numbers that are zero, then has exactly characteristic 
numbers that are equal to 1, However ， r,- is the rank of A" Thus each 
of the /v nonzero characteristic numbers of A/is I * That is ， A? = A, and 
thus Qijo 2 is % 2 {rX / = 1 ， 2” …， Hn accordance with Theorem 4, the 
random variables Q u 0 2 > * — > Qk are independent* 

To complete the proof of Theorem 5, take 

Yj ^ ^ 2i + Os + ^ * + Qkf 
1 ^ f 
let Q u Q 2 ^ ^ ^Qk be independent, and let QJa 1 be x 2 ( r jh 

j = U 2, Then ^ Qjja 1 is X 2 ^E r y\ But Z Qjf^ 2 = E ^i° 2 is 

..* 々 > 、■- l - * * 

/(/!)• Thus Y, r j — n and the proof is complete. 

1 J r J 

«■ 

EXERCISES / 

10.42. Let X ly X 2 ^ X 3 be a random sample from the normal distribution 
JV(0, <J 2 ), Are the quadratic forms X\ + ^X\X 2 + X] + X t X^ + X\ and 

— 2X { X 2 + f ^2 ^ 2X]X 3 — X\ independent or dependent? • 

+. - - - " * . 
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10.43. Let X u X 2 ^.. ^ X n denote a random sample of size n from a 

n 

distribution which is N(Q ， a 2 ). Prove that ^ and every quadratic form ， 

\ / 

which is nonidentically zero in X u X 2 , …， are dependent* 

10.44 Let X u X 2y X 4 denote a random sample of size 4 from a 

4 

distribution which is N(0, c 2 ). Let Y — Y, a i^h where a u a 2 , and a A are 

» 

real constants. If Y 2 and Q — X^X 2 — X'X* are independent, determine a ! ， 

ctj'i 3 ^ 0(3 

10.45, Let A be the real symmetric matrix of a quadratic form Q in the 
observations of a random sample of size n from a distribution which is 
A^(0, a 2 y Given that Q and the mean X of the sample are independent. 
What can be said of the elements of each row (column) of A? 

Hint: Are Q and X 2 independent? 

10.46. Let A h A 2 ,_be the matrices of k>2 quadratic forms 

Q u Q 2 ,... ,Q k in the observations of a random sample of size n from a 
distribution which is iV(0, a 2 ). Prove that the pairwise independence of these 
forms implies that they are mutually independent. 

Hint: Show that A f Aj = 0 ， i 手 permits £[exp {t\Q\ + 
hQi + * ■ + t k Q k )\ to be written as a product of the moment-generating 
functions of 。，么,…，仏 

10.47* Let X' = [X u &,•.■， XX where X { ,X 2 ^ . • , ， are observations 
of a random sample from a distribution which is N(0, a 1 ). Let 
b = =[A* ， b 2 ，…， b„]be Si real nonzero matrix, and let A be a real symmetric 
matrix of order n. Prove that the linear form b’X and the quadratic form 
X'AX are independent if and only if b"A — 0. Use this fact to prove that 
b"X and X'AX are independent if and only if the two quadratic forms, 
(b f X) 2 - X bb X and X'AX ， are independent. 

!0»48, Let Q } and Q 2 be two nonnegative quadratic forms in the observations 
of a random sample from a distribution which is iV(0, a 2 ). Show that 
another quadratic form Q is independent of Q x + Q 2 if and only if Q is 
independent of each of Q } and Q 2 . 

Hint: Consider the orthogonal transformation that diagonalizes the 
matrix of 0 , + Q 2 . After this transformation, what are the forms of the 
matrices of Q, Q u and Q 2 if Q and Q t + Q 2 are independent? 

10,49, Prove that Equation (4) of this section implies that the nonzero 
characteristic numbers of the matrices D and D 23 are the same. 

Hint: Le{ X — l/(2/ 2 ), h ^ 0, and show that Equation (4) is equivalent 
to [D - 21\ = (^xy\B 22 - 

10*50* Here Q { and Q 2 are quadratic forms in observations of a random 
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sample from A^(0, 1). If and Q 2 are independent and if Q } + Q 2 has a 
chi-square distribution, prove that Q l and Q 2 are chi-square variables, 

10*51- Often in regression the mean of the random variable Y is a linear 
function of /^values x u x 2 , — , x pt say fi t x t + fi 2 x 2 + … + p p x p 、 where 
fi' — (y?i, ? P p ) are the regression coefficients. Suppose that n values^ 

Y' ~ (V u • *., Y n ), are observed for the rvalues in X = {x i} \ where X 
is an n x p design matrix and its rth row is associated with 
Y h i = 1 ， 2” ，. ， rt. Assume that Y is multivariate normal with mean Xp 
and covariance matrix cr 2 !, where I is the n x n identity matrix. 

(a) Note that y!, F 2 , *.., Y„ are independent. Why? 

(b) Since Y should approximately equal its mean Xp, we estimate p by 
solving the normal equations X'Y — X'Xp for p* Assuming that X"X is 
nonsingular, solve the equations to get p —( 女 'X ) 一 H, Show that p 
has a multivariate norma! distribution with mean p and covariance 
matrix a^X'X) -1 * 

(c) Show that 

(Y - XPX(Y - XP) = (P- P)'(X_ - ff) + (Y - Xp)，(Y - 础， 

say Q — Q\ + Q 2 for convenience, 

(d) Show that Qi/<t 2 is x\p)^ 

(e) Show that Q x and Q 2 are independent, 

(0 Argue that Q 2 ju 2 is x 2 {n - p). 

(g) Find c so that cQ x jQ 2 has an ^-distribution. 

(h) The fact that a value d can be found so that Pr (cQ t IQ 2 < d) ^ l — a 
could be used to find a 100(1 — on) percent confidence ellipsoid for p. 
Explain. 

(i) If the coefficient matrix p has the prior distribution that is multivariate 
normal with mean matrix 札 and covariance matrix what is the 
posterior distribution of p, given p? 

10.52* Say that G.P. A, (Y)is thought to be a linear function of a “coded” high 
school rank (x 2 ) and a “coded” American College Testing score (x 3 ), 
namely ， 良 + p 2 x i + Note that all x } values equal }. We observe the 
following five points: ， 


4 ^^ r 

1 2 3 

4 3 6 

2 2 4 

4 2 4 

3 、 2 4 


(a) Compute X X and p = (X / X) ， *X / Y +〉， 

(b) Compute a 95 percent confidence ellipsoid for p'= ( 泠 ■ ，芦 2 , A), 
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ADDITIONAL EXERCISES , 、 , 

^ * 

§ 

* * r ' *_ > m 

10*53. Let 叫 ， # 2 , /x 3 be ， respectively, the means of three normal distributions 
with a common but unknown variance a 2 . In order to test，at the a — 5 
percent significance level, the hypothesis H 0 : — ft 2 = f ^3 against alt 

possible alternative hypotheses, we take an independent random sample of 
size 4 from each of these distributions. Determine whether we accept or 
reject H 0 if the observed values from these three distributions are, 
respectively ， 


X 3 : 10 6 9 9 

脅 _ # 

10,54* The driver of a diesel-powered automobile decided to test the quality 
of three types of diesel fuel sold in the area based on mpg. Test the null 
hypothesis that the three means are equal using the following data. Make 
the usual assumptions and take a — 0.05. 

眷 

Brand A; 38.7 39.2 40J 38,9 

k . - 

Brand B: 41,9 423 41,3 

T ■> 

Brand C: 40.8 41.2 39.5 38,9 40.3 

■ + • 

10.55, We wish to compare compressive strengths of concrete corresponding 
to 0 = 3 different dryingmethods (treatments). Concrete is mixed in batches 
that are just large enough to produce three cylinders. Although care is 
taken to achieve uniformity, we expect some variability among the 6 = 5 
batches used to obtain the following compressive strengths. (There is little 
reason to suspect interaction and hence only one observation is taken in 
each cell.) 


(a) Use the 5 percent significance level and test — a 2 — ol 3 — 0 

against all alternatives . ; ， 

(b) Use the 5 percent significance level and test H B : = 

ft = = /?s — 0 against all alternatives. 


B 


6 


2 

1 

o 


9 


3 

n 


5 1 


#* * « 


1—J 

c 

t 

a 

B 




A 




2 3 8 
4 4 3 


52 44 
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■hp 
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a 
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54945 


475548 


526056 


I 2 3 

x^d 
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10,56, With a = 3 and i — 4, find /x, a h and y tj , if jx ip i = 1 ， 2, 3 and 
j — 1 ， 2, 3,4, are given by 

, ^ a 4 " ^ 

6 7 7 12 

蜃 m 

， 10 ' 3 II 8 

■ J M '■ % 

8 5 * 9 10 

* , r . 4 

b ■ , b » 

10,57* Two experiments gave the following results: 


n 

X 



s y 

r 

100 

10 

20 

5 

8 

0.70 

200 

12 

22 

6 

!0 

0.80 


Calculate r for the combined sample. 

da 4, r ■ _ ' , ^ « 

■ ;# _ a I ■ j I m ■ «. j ■ 

10*58. Consider the following matrices: Y is n x 1, p isp x l^Xisn x p and 
of rank p. Let Y be JV(Xp ， o^I). Discuss the joint p.df. of p = (X^^'X'Y 
and Y fl - ^(X'XrTjY/a 2 … 


10,59, Fit y — a + x to the data 


X 

0 

1 

2 

y 1 

1 3 4 


by the method of least squares. 


10.60. Fit by the method of least squares the plane z — a + bx + cy to the 
five points (x, y, z): ( — 1 ， 一 2, 5) ， fo, _ 2, 4) ，（ 0, 0, 4), (1 ， 0, 2) ，（ 2, 1 ， 0), 


基 jfc 

10,61, Let the 4 x ] matrix Y be multivariate normal iV(Xp 5 a 2 !), where the 
4x3 design matrix equals 



and p is the 3x1 regression coefficient matrix. 

(a) Find the mean matrix and the covariance matrix of p — (X’X)— l X'Y. 

(b) If we observe Y" to be equal to (6, 1 ， 11 ， 3)，compute p. 


10.62. Let the independent normal random variables Y u y 2 , ■… ， have ， 
respectively，the probability density functions N{jx, y 2 #)，i — l s 2, 
where the given x 2 ^ > x n are not all equal and no one of which is zero. 
Discuss the test of the hypothesis 好 0 : y = i ，卩 unspecified, against all 
alternatives H { :y ^ l y ^ unspecified/ -： h 
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10*63. Let Y k9 K 2 , _ •. ， Y n be n independent normal variables with common 
unknown variance a\ Let r, have mean 释 x h i= 1,2” … ， w，where 
x { ,x 2j - - -, are known but not all the same and p is an unknown 
constant. Find the likelihood ratio test for // 0 : = 0 against all 
alternatives* Show that this likelihood ratio test can be based on a statistic 
that has a well-known distribution. 

10.64, Consider the multivariate normal p.d.f. /(x; ji, L) where the known 
parameters equal either \i u or |i 2 , 2^, respectively. 

(a) If is known to equal E, classify X as being in the second of these 

distributions if 

/(x; |i 2 , E) 一， 

otherwise, X is classified as being from the first distribution. Show that 
this rule is based upon a linear function of X and determine its 
distribution. This allows us to compute the probabilities of 
misclassification. 

(b) If and E 2 are different but known，show that 

/(x ； fi h £,) 

加以 2 厂 

can be based upon a second degree polynomial in X. When either T, { 
or E 2 is the correct covariance matrix, does this expression have a 
chi-square distribution? 








CHAPTER 


11 


N onp arametric 

Methods 


11.1 Confidence Intervals for Distribution Quantiles 

We shall first define the concept of a quantile of a distribution of 
a random variable of the continuous type. Let Xbea random variable 
of the continuous type with p,dX/(jc) and distribution function F(xy 
Let 尸 denote a positive proper fraction and assume that the equation 
f ( x ) = has a unique solution for ^ This unique root is denoted by 
the symbol ^ and is called the quantile (of the distribution) of order 

^ Thus Fv ( x ^ Q ^ F (Q = For example, the quantile of order \ 
is the median of the distribution and Pr (X < ^ 05 ) = F(^ os ) = i. 

In Chapter 6 we computed the probability that a certain random 
interval includes a special point. Frequently, this special point was a 
paratneter of the distribution of probability under consideration* 
Thus we are led to the notion of an interval estimate of a parameter. 
If the parameter happens to be a quantile of the distribution, and if wt 
work with certain functions of the order statistics, it will be seen that 
this method of statistical inference is applicable to all distri- 
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butions of the continuous type. We call these methods distribution-free 
or nonparametric methods of inference. 

To obtain a distribution-free confidence interval for ^ the quantile 
of order p, of a distribution of the continuous type with distribution 
function F(x), take a random sample X u X 2 , - *., ^of size « from that 
distribution. Let < Y 2 < ^ — < Y n be the order statistics of the 
sample- Take Y t < and consider the event Y t < ^ For the ith 

order statistic F, to be less than ^ it must be true that at least i of the 
X values are less than Moreover, for the jth order statistic to be 
greater than fewer than j of the X values are less than That is ， 
if we say that we have a “success” when an individual X value is less 
than t ， then, in the n independent trials，there must be at least i 
successes but fewer than j successes for the event Y t < ^ p < Y } to 
occur. But since the probability of success on each trial is 
Pr (X < ^ p ) = F(^ p ) = p, the probability of this event is 

Pr (w o=I m 

the probability of having at least i, but less than / ， successes. When 
particular values of «，/， and j are specified，this probability can be 
computed. By this procedure, suppose it has been found that 
y ~1 ?t (¥,■ < ^ < Yj). Then the probability is y that the random 
interval (Y h YJ) includes the quantile of order p. If the experimental 
values of Y ( and Yj are, respectively, y f and 乃 ， the interval (y h yj) serves 
as a IOO 7 percent confidence interval for the quantile of order p. 

An illustrative example follows. 

^ 言 

Example L Let Y { < Y 2 < Y 3 < Y 4 be the order statistics of a random 
sample of size 4 from a distribution of the continuous type. The probability 
that the random interval (K,, Y 4 ) includes the median 5 of the distribution 
will be computed. We have 

Pr(Y i <Us<Y 4 )=i = 0.875. 

If Y x and Y 4 are observed to be j;, = 2.8 and y 4 = 42, respectively, the interval 
(2.8, 4,2) is an 87.5 percent confidence interval for the median f 05 of the 
distribution. 

t : ^ . ^ • 1 * 

For samples of fairly large size，we can approximate the binomial 
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probabilities with those associated with normal distributions, as 
illustrated in the next example. 

4 會 * , ■ 

Example Z Let the following numbers represent the values of the order 
statistics of w — 27 observations obtained in a random sample from a certain 
distribution of the continuous type: 

* ^ _*>«• « » j • i ► 

61 ， 69 ， 71 ， 74 ， 79 ， 80 ， 83 ， 84 ， 86 ， 87 ， 92, 93 ，％，_， 
104, 105, 113, 121 ， 122, 129, 141 ， 143, 156, 164, 191 ， 2 】 7, 276. 

Say that we are interested in estimating the 25th percentile ^ 02 s (that is, the 
quantile of order 0*25) of the distribution. Since (n + l)p = 28(|) = 7, the 
seventh order statistic ， 少 7 = 83, could serve as a point estimate of f 025 . To get 
a confidence interval for ^ 02Sj consider two order statistics, one less than y 7 
and the other greater, for illustration, y A and 少 ⑴. What is the confidence 
coefficient associated with the interval (y 4 , y lQ )l Of course, before the sample 
is drawn, we know that 』 ■ 、 


Pr (Y 4 < ^25 < F !0 ) 




I 4 0’25 广 (0.75) 27 -' 


That is, 


\ y = Pr(3.5< W<9.5), v ' 

where W is 6(27, \) with mean f = 6,75 and variance Hence y is 
approximately equal to 


O 


9.5 - 675 


4>l 


3.5 — 6.75 


<P 



+f) 


0 . 814 — 


Thus ( 少 4 = 74 5 y ]0 = 87) serves as an approximate 81.4 percent confidence 
interval for “ 25 , It should be noted that we could choose other intervals also, 
for illustration, (j 3 — 71, Jn — 92)，and these would have different confidence 
coefficients. The persons involved in the study must select the desired 
confidence coefficient, and then the appropriate order statistics, F, and Y Jy 

are taken in such a way that i and j are fairly symmetrically located about 
(n + l)p. . 

EXERCISES 


ILL Let Y n denote the nth order statistic of a random sample of size n from 
a distribution of the continuous type. Find the smallest value of n for which 
Pr (e 09 < Y n ) > 0J5. 

T 

■ 

H.2. Let F, < Y 2 < Y 3 < Y 4 < Y s denote the order statistics of a random 
sample of size 5 from a distribution of the continuous type. Computer 
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(a) PT(Y t <U 5 < Y 5 y 

(b) Pr(7 1 <4 2 5<^ 

(c) Pr (n < ( 0 . 8 0 < F 5 ). 

11.3. Compute Pr(F 3 < < F 7 ) if h < . “ < K are the or <i er statistics of 

a random sample of size 9 from a distribution of the continuous type* 

11.4. Find the smallest value of n for which Pr (Y t < < Y„) ^ 0.99, where 

Y } <- — < Y n are the order statistics of a random sample of size n from a 
distribution of the continuous type, 

11.5. Let Y { < Y 2 denote the order statistics of a random sample of size 2 from 
a distribution which is 戊 2 ), where a 2 is known. 

(a) Show that Pr (Y x < ^i< Y 2 ) — | and compute the expected value of the 
random length F 2 — Y { . 

(b) If X_ is the mean—of this sample, find the constant c such that 
Ft (X — ca < ft < X + ca) = ^ and compare the length of this random 
interval with the expected value of that of part (a). 

11 . 6 . Let F, < r 2 < • *. < Y 2S be the order statistics of a random sample of 
size n ^ 25 from a distribution of the continuous type. Compute 
approximately: 

(a) Ft (Y, < Us < Y l% y 

(b) Pr(r 2 <4. 2 < nX 

(c) Pr(r ls < 4 8 < 

1L7, Let < F 2 < • ， • < Y m be the order statistics of a random sample of 
size n — 100 from a distribution of the continuous type. Find i <jso that 
Pt (Y f < £^0 2 < Yj) is about equal to 0,95, 

II* 8 - Let ^, /4 be the 25th percentile of a distribution of the continuous type. 
Let Y { < Y 2 < ^ < Y 4S be the order statistics of a random sample of size 
« = 48 from this distribution. 

(a) In terms of “binomial probabilities，” to what is Pr (K 9 < ^ /4 < Y^) 
equal? 

(b) How would you approximate this answer with “normal probabilities ”？ 

(c) Find i such that Pr ( Y n ^ / < ^i /4 < + ,) is as close as possible to 0.95 

(using the normal approximation). 

11.2 Tolerance Limits for Distributions 

餐 .* 

We propose now to investigate a problem that has something of 
the same flavor as that treated in Section 11 丄 Specifically, can we 
compute the probability that a certain random interval includes (or 
covers) a preassigned percentage of the probability for the distri- 
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bution under consideration? And，by appropriate selection of the 
random interval, can we be led to an additional distribution-free 
method of statistical inference? 

Let Xbe a random variable with distribution function F(x) of the 
continuous type. The random variable Z = F{X) is an important 
random variable, and its distribution is given in Example I, Section 4 丄 

It is our purpose now to make an interpretation* Since Z = F(X) has 
the p,d,f. 

h{z) =1 ， 0 < z < 1, 

= 0 elsewhere, 

then，if 0 < /? < 1 ， we have 

rp 

Pr [F(X) < p]= dz — p. 

Now F(x) = Pr (X < x). Since Pr (X = x) ^ 0, then F\x) is the 
fractional part of the probability for the distribution of X that is 
between — oo and x If F(x) < p 9 then no more than IQOp percent of 
the probability for the distribution of X is between — oo and x. But 
recall Pr [F(X) <p] = /?, That is, the probability that the random 
variable Z — F(X) is less than or equal to p is precisely the probability 
that the random interval ( —oo, X) contains no more than lOOp percent 
of the probability for the distribution. For example, the probability 
that the random interval (—oo, X) contains no more than 70 percent 
of the probability for the distribution is 0,70; and the probability that 
the random interval ( — oo, X) contains more than 70 percent of the 
probability for the distribution is 1 — 0.70 = 030. 

We now consider certain functions of the order statistics. Let 
^ 2 , …”尤 denote a random sample of size n from a distribution 
that has a positive and continuous pAS, f(x) if and only ifa<x<b; 
and let F(x) denote the associated distribution function. Consider 
the random variables F(X x \ F{X 2 \ ■ • ，， F{X n ), These random 
variables are independent and each, in accordance with Example 1 , 
Section 4_1，has a uniform distribution on the interval ( 0 , 1 ). Thus 
尺⑹， …， 尺先） is a random sample of size n from a uniform 
distribution on the interval (0, 1 ). Consider the order statistics of this 
random sample F(X } ), F(X 2 \ …, F(^ n ). Let Z x be the smallest of these 
F(Xi\ Z 2 the next F{Xi) in order of magnitude, •,,, and Z n the largest 

F(X f ). If Y ly F 2a - , Y n are the order statistics of the initial random 

sample X u X 2 , . …， 尤 ， the fact that F(x) is a nondecreasing 
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(here，strictly increasing) function of x implies that Z, : =wx 
Z 2 — F( Y 2 \ … ， Z n = F{Y n ). Thus the joint of … ， Z ， is 

given by + . 

h{z u z 2 , •… ， z„) ：= a !， 0 < a < z 2 < • • • < < 1 ， 

— 0 elsewhere. 

This proves a special case of the following theorem- 

Theorem L Let Y u F 2 , *.. Y n denote the order statistics of a 
random sample of size n from a distribution of the continuous type that 
has p.df. f(x) and distribution function F(x). The joint p.djfl qf the 
random variables = F( Y { ), z. = 1 ， 2, • •. ， w， & T 


h(zy^ Z2^ • - * ， a)= 行!， 0 $ a < a < ’ < A < 1 ， 


― 0 elsewhere, 

t r + f ， ，晒貫 ^ ,*■ _ 番 

Because the distribution function of Z — F{X) is given by z, 
0 < : < 1 ， the marginal p.d,f. of Z k = F{ Y k ) is the following beta p.d.f •: 


镅 !■ 

h k (z k ) 


n\ 




(k- I)! {n - k)\ 
0 elsewhere, 


W ， 0 < 2 ,< 1 , 


⑴ 


Moreover, the joint p.d.f. of Z f - = F(Y f ) and Zj — F{Yj) is, with i < j y 
given by 


Zj)= 


一 0 - 0 — i ~ l)! ( n —/)! 


— 0 elsewhere. 


0 < Zf < Zj < I, 


( 2 ) 


Consider the difference Zj - Z f ^ F(Yj) - F(Y f ) y i < j. Now 
F(yj) (X < yj ) and F( yi ) = Pr (X < Since Pr (X = y { )= 
Pr (X — yj) = 0, then the difference F(yj) — F(y i ) is that fractional part 
of the probability for the distribution of X that is between and yj. 
Let p denote a positive proper fraction• If 巧乃）一 Fiyi) > p, then at 
least 1 OOp percent of the probability for the distribution of X is between 
y f and Let it be given that y — Pr [F{ Yj) - Fl Y^ > pi Then the 
random interval (T^ Y } ) has probability 7 of containing at least 100/? 
percent oi the probability for the distribution of X. If now y t and y } 
denote, respectively, experimental values of Y f and the interval 
(y h yj) either does or does not contain at least \00p percent 
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of the probability for the distribution of X, However, we refer to the 

interval as a lOOy percent tolerance interval for lOOp percent of 

the probability for the distribution of X- In like vein, y t and yj are called 

lOOy percent tolerance limits for 1 OOp percent of the probability for the 
distribution of X. 

■ * u * |r 

One way to compute the probability y = Pr [F(Yj) - FiY^ > p) is 
to use Equation (2), which gives the joint p,d.f, of 乙 = F(Yi) and 
Zj = F(Yj). The required probability is then given by 

f%[ -p 广 j ' 

7 = Pr (Zj -Zi>p)^ I hijizi.z^dzjdz^ 

Jo + 2i 1 

* • “ 

Sometimes, this is a rather tedious computation. For this reason and 
for the reason that coverages are important in distribution-free 
statistical inference, we choose to introduce at this time the concept of 
a coverage, ， 

Consider the random variables W x — F(Y } ) = Z u fV 2 = F(Y 2 ) — 
二 Z 2 -Z t ， W 3 = F(Y 3 ) ^ F(Y 2 ) F(Y n )- 

L — r) = 乙 一 _ 卜 The random variable W x is called a coverage of 
the random interval {x : — oo < x < F,} and the random variable W h 
I = 2, 3, . ， . ， /I， is called a coverage of the random interval 
{xiY.^kxk Yi}. We shall find the joint p.d.f of the n coverages 

岭 1 ，％， • * ” First we note that the inverse functions of the 
associated transformation are given by 

* * * ■ 

z 2 = w l -^ w 2j 

儀 

Z 3 =Z w { + w 2 + 


+ >v 2 + H^3 + * * ■ + w n . 

We also note that the Jacobian is equal to 1 and that the space of 
positive probability density is ' 

Jh . ■ _ 1 » T ^ ' V B 1 _ I 

{( w ，， w 2 ,.,w n ) : 0 < w h i = 1，2, ... ，《， + … + < I }. 

Since thejointp.df. dfZ„ Z 2 ， … ， Z rt is<0 < z, < < … < < 1 ， 

zero elsewhere^ the joint p.d*f. of the h coverages is 

0 < W h i ^ I, … • ， / 1，— 十 …， + < 1, 

— 0 elsewhere* * _ ' 乂 






Nonparametfic Methods [Ch，11 


A reexamination of Example 1 of Section 4.5 reveals that this is a 

Dirichlet p,d.f. with k = n and — a 2 — • * * — + i — 1- 

Because the p.df. k(w lf …， wj is symmetric in %， w 2 , •..，％， it 
is evident that the distribution of every sum of r, r < n, of these 


coverages is exactly the same for each fixed value of r. 

For instance, if i < j and r = y — the distribution of Zy — Z, — 
F(Yj) - FiYi) = % + , + %+ 2 + _. • + is exactly the same as that 
of Zj^i = F{Yj^ = W + 『2 + • •. + But we know that the 
p.d.f. of Zj^i is the beta p.df. of the form 


Kv) = 


r(n + 1) 


r(y — or (卜 /+/+1) 


iy' 


■(I — o < r < 1， 


= 0 elsewhere* 


Consequently ， F{Yj) — JFd) has this p_d,f. and 

M 

Pr[F(y,)-i^r；) >p]^ hj^iv)dv. 

« 

i ’， 

Example i. Let K L < • • * < 匕 be the order statistics of a random 
sample of size 6 from a distribution of the continuous type. We want to use 
the observed interval (h，as a tolerance interval for 80 percent of the 
distribution. Then 


y = Pr[Kr 6 )-i?(7 l )2 ： 0.B] 

/ > 0.8 





300(1 — v) dv. 


because the integrand is the p*d,f, of / ^1^) _ Accordingly, 

y = 1 - 6(0,8) 5 + 5(0.8) 6 = 0.34 ， 

approximately. That is, the observed values of Y x and Y 6 will define a 34 

percent tolerance interval for 80 percent of the probability for the distribution. 

* * 

Example 2. Each of the coverages W h / = 1, 2,,,,, has the beta pAS. 

ii(w) = n(l — 0 <w < U 

m r -4 

= 0 elsewhere. 


because W x - Z» = has this pA.t Accordingly, the mathematical 
expectation of each W t is 


nw(l 

Jo 
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Now the coverage can be thought of as the area under the graph of the 
p.dX/(x)，above the x»axis 3 and between the lines x = and x = Y h (We 
take Y 0 = —oo.) Thus the expected value of each of these random areas W h 
= 1 ， 2, " ” is l/(ra 十 1 ), That is, the order statistics partition the 
probability for the distribution inton + I parts, and the expected value of each 
of these parts is I/(n + 1), More generally, the expected value of f{ F y ) — F( 
i < j\ is (J — i)l{n + I), since F(Yj) — l\Yi) is the sum of j — i of these 
coverages. This result provides a reason for calling Y k , where (n + l)p = 
the ( 100 /?)th percentile of the sample, since 


mY k )i 


k 


{n + l)p 


n 


n 


P ， 


EXERCISES 

1L9* Let Y x and Y„ be, respectively, the first and nth order statistics of a 
random sample of size n from a distribution of the continuous type having 
distribution function F{^y Find the smallest value of n such that 
Pr [F{Y n ) -F(Y X )^ 0,5] is at least 0.95. 

11.10, Let Y 2 and Y„^ { denote the second and the in — l)st order statistics 
of a random sample of size n from a distribution of the continuous type 
having distribution function F(x). Compute Pr ^ ^ 

where 0 <p < 1 . 

重 Let F, < F 2 < * … < Y m be the order statistics of a random sample 
of size 48 from a distribution of the continuous type. We want to use the 
observed interval ( 少 4 , ^ 45 ) as a lOOy percent tolerance interval for 75 percent 
of the distribution. 

(a) To what is y equal? 

(b) Approximate the integral in part (a) by noting that it can be written as 
a partial sum of a binomial p,dX s which in turn can be approximated 
by probabilities associated with a normal distribution. 

11J2. Let < K 2 < • * * < Y n be the order statistics of a random sample of 
size n from a distribution of the continuous type having distribution 
function F(x) w 

(a) What is the distribution o( U = l - F( Yj)l 

(b) Determine the distribution of V=I\Y n )- F{Yj) + — FiY { ), 

where i < j. 

1L13_ Let r_ < K 2 < " • < Y m be the order statistics of a random sample 
from a continuous-type distribution with distribution function jR[x). What 
is the joint distribution of V x — JF{Y 4 ) — F(Y 2 ) and V 2 — 
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11,3 The Sign Test . 

嚅 ^ * 

* * t ^ I .' - 

Some of the chi-square tests of Section 6.6 are illustrative of the 
type of tests that we investigate in the remainder of this chapter. Recall ， 
in that section, we tested the hypothesis that the distribution of a 
certain random variable X is a specified distribution. We did this in the 
following manner. The space of X was partitioned into k mutually 
disjoint sets H …， ▲• The probability p m that X e A] was 
computed under the assumption that the specified distribution is the 
correct distribution, i = 1 ， 2,… ， fc. The original hypothesis was then 
replaced by the less restrictive hypothesis 

醫 

: Pr {X € v4 # ) = PtQy i = 1 ， 2, •" ，灸 ; 

^ ^ M * ^ , 

and a chi-square test，based upon a statistic that was denoted by Q k ^ , 
was used to test the hypothesis H 0 against all alternative hypotheses. 

There is a certain subjective element in the use of this test, namely 
the choice of k and of A u A 2 , … ，為 • But it is important to note that 
the limiting distribution of 仏 - f , under ff Q , is x\k - I); that is, the 
distribution o^Q k ^\ is free, of p ]Q , / > 20 , ■ • • ， and, accordingly, of the 
specified distribution of X, Here，and elsewhere, “under means 
when H 0 is true. A test of a hypothesis H 0 based upon a statistic whose 
distribution, under does not depend upon the specified distribution 
or any parameters of that distribution is called a distribution-free or a 
nonparametric test. 

Next, let F(x) be the unknown distribution function of the random 
variable X. Let there be given two numbers 《 andwhere 0 < p 0 < K 
. We wish to test the hypothesis H Q : F(^) = p 0 , that is，the hypothesis 
that ^ — the quantile of order p 0 of the distribution of X. We could 
use the statistic with k = 2, to test H 0 against all alternatives. 
Suppose, however，that we are interested only in the alternative 
hypothesis, which is ffi : F(^) > p 0 . One procedure is to base the test 
of H 0 against H t upon the random variable K, which is the number of 
observations less than or equal to ^ in a random sample of size n from 
the distribution* The statistic Y can be thought of as the number of 
“successes” throughout n independent trials. Then, if H 0 is true, Y 
is b[n, p 0 = ^)]; whereas if H 0 is false, Yis b[n,p = 1(^)] whatever be 
the distribution function We reject // 0 and accept //, if and 
only if the observed value y > c, where c is an integer selected 
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such that Pr (Y > c; H 0 ) is some reasonable significance level a.The 
power function of the test is given by 

^ip) = Y -/ 0 "， po<p<^. 


where p = F{^). In certain instances, we may wish to approximate 
K{p) by using an approximation to the binomial distribution. 

Suppose that the alternative hypothesis to H 0 : F(0 = p 0 is 
ff \ : Then the critical region is a set {y :y < Finally, if 

the alternative hypothesis is : F{^) # p 0 , the critical region is a set 
{y：y <c 2 or cj< y}, ， 

Frequently,/? 0 = | and, in that case，the hypothesis is that the given 
number ^ is a median of the distribution. In the following example ， 
this value of p Q is used- 

Example h Let X u X 2i ,.., X )0 be a random sample of size 10 from a 
distribution with distribution function F(jc), We wish to test the hypothesis 
风： F(12) = I against the alternative hypothesis H x : /1[72) > ^ Let K be the 
number of sample items that are less than or equal to 72, Let the observed 
value of Ybe and let the test be defined by the critical region {y : y > 8}. 
The power function of the test is given by 


where p — F{12). In particular, the significance level is 



128. 



B ' i- ■ a ■ 

In many places in the literature, the test that we have j ust described 
is called the sign test. The reason for this termmology is that the test 
is based upon a statistic Y that is equal to the number of nonpositive 
signs in the sequence X， — H ~ 6 In the next section 

a distribution-free test，which considers both the sign and the 
magnitude of each deviation X t — (is studied. 

* m 1 * 

m. ' •- 竇 * •» * * "■ 

EXERCISES 


11-14. Suggest a chi-square test of the hypothesis which states that a 
distribution is one of the beta type，with parameters a — 2 and p — 2. 
Further, suppose that the test is to be based upon a random sample of size 
100, In the solution, give k, define d ■，/ i 2 , • ： ， A k , and compute each 
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If possible, compare your proposal with those of other students. Are 
any of them the same? 

11.15. Let X u X l7 ，■” X A% be a random sample of size 48 from a distribution 

that has the distribution function 八斗 To test H 0 : F(4l) — - against 
H l : ^41) < use the statistic Y, which is the number of sample 
observations less than or equal to 41. If the observed value of Yis y <^1, 
reject H 0 and accept If p ^ F(4l), find the power function K(p), 
0 < ^ I，of the test Approximate a = 尺 (|), 

11.16. Let X 2 , ,.. ， X m be a random sample of size 100 from a distri¬ 
bution that has distribution function JF{x), To test : i^90) — f(60) — j 
against Hi : 汽 90) — i^60) > use the statistic 7, which is the number of 
sample observations less than or equal to 90 but greater than 60. If the 
observed value of y, say is such that y > c, reject Find c so that 
a ― 0.05, approximately, 

11.17. Let X { , X 2 ,,.,, be a random sample from some continuous-type 

distribution. We wish to consider only unbiased estimators of Pr (X < c\ 
where c is a fixed constant ■ 

• ■ I 

(a) What would you use as an unbiased estimator if you had no additional 
assumptions about the distribution? 

(b) What would you use as an unbiased estimator if you knew the 
distribution was normal with unknown mean # and variance a 2 — I? 

li*4 A Test of Wilcoxosi 

Suppose that X u X 2 , ^ is a random sample from a 
distribution with distribution function F(x). We have considered a test 
of the hypothesis F(0 = ^ ^ given, which is based upon the signs of 

the deviations X x — LX % — L .. .^X n — ^Au this section a statistic is 

^ • 

studied that takes into account not only these signs，but also the 
magnitudes of the deviations. 

To find such a statistic that is distribution-free, we must make two 
additional assumptions: 

_ ■ 謦 a 

K F{x) is the distribution function of a continuous type of random 
variable X, 

2 . The p.df* f(x) of A" has a graph that is symmetric about the vertical 
axis through 匕 5 , the median (which we assume to be unique) of the 
distribution* 

Thus 

- - * h 

F(Us - x) - 1 - F(Us + x) 
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and 

Mo.5 - X) =Mo. s + JC), 

for all x. Moreover, the probability that any two observations of a 
random sample are equal is zero, and in our discussion we shall assume 
that no two are equal 

The problem is to test the hypothesis that the median s of the 
distribution is equal to a fixed number, say f Thus we may, in all cases 
and without loss of generality, take ^ ^ 0, The reason for this is that 
if 《 # 0， then the fixed ^ can be subtracted from each sample 
observation and the resulting variables can be used to test the 
hypothesis that their underlying distribution is symmetric about zero. 
Hence our conditions on and f{x) become 巧 一 x) = 1 — F(x) and 
/(—x) = f(x) f respectively. 

To test the hypothesis H 0 : F(0) = we proceed by first ranking 
X U X 2 ,.. according to magnitude, disregarding their algebraic 
signs. Let R t be the rank of jJT；| among |%1 ， |X 2 |， …， 
l s 2,… ， n. For example, if rt = 3 and if we have|X 2 | < \X 3 \ < 
then R { = 3, 及 2 = 1， and R y = 2. Thus 及 , ，及 2 , ….，尺 is an arrange¬ 
ment of the first n positive integers 1 ， 2, • ， _ ， n. Further, let Z h 
i = 1 ， 2,… ， n，be defined by 

—1 ， if A ； <0 ， 

=1, if Xi > 0- 

If we recall that Pr {Xi = 0) = 0, we see that it does not change the 
probabilities whether we associate Z, — I or Z ； — with the outcome 

不 = 0. 

rt 

The statistic W— ^ Z^it, is the Wilcoxon statistic. Note that in 

I = i 

computing this statistic we simply associate the sign of each X f with the 
rank of its absolute value and sum the resulting n products. 

If the alternative to the hypothesis Ho i ^o.s = 0 is //■: > 0, we 

reject H 0 if the observed value of W is an element of the set {w:w> c}. 
This is due to the fact that large positive values of W indicate that 
most of the large deviations from zero are positive. For alternatives 
5 < 0 and ( a5 # 0 the critical regions are, respectively, the sets 
{w : w <, c^} and {w :w < c 2 or w > To compute probabilities like 
Pr(W> c; Hq), we need to determine the distribution of W, under 
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0 


Af(t^ = E 


= nm ； . :: 

^ • L r "■ * r \ * * ■ « w 

We can express M{t) as the sum of terms of the form (aj/2 n )e b j ! . When 
is written in this manner，we can determine by inspection the 
pAS. of the discrete-type random variable W. For example, the 
smallest value of W is found from the term (\/2 n )e^ f e~ 2t - - - e~ nt = 

W2 and it is ~n(n + 1)/2* The probability of this value 

of W is the coefficient 1/2 W , To make these statements more concrete 
take n ^ 3, Then 


To help us find the distribution of W y when /V: F(0) = ^ is true we 
note the following facts: 2 ， 

1- The assumption that /(x) =/(—x) ensures that Pr (X t < 0)= 
Pr « > 0) = I， i = 1 ， 2, .. • ， /a. 

2. NowZ^ - lifX t < Oand Z, = 1 if A ； > 0, / = 1, 2 ，、 … n. Hence 

we ha ! e Pr(Z i ^^l)^Pr(Z i =l) = l y 札 More- 

over ， Z 2 , …,乙 are independent because H …， X are 
independent. 

assumption thatf(x) =f(- x ) also assures that the rank ^ of 
| 不 | does not depend upon the sign Z t of X t , More generally, 
H ■ • •，凡 are independent of Z x , Z 2 , …， Z n . • 

4, A【um Fis made up of the numbers I ， 2,…， w ，each number with 
cither a positive or a negative sign. 

The preceding observations enable us to say that (F— ^ 

has the same distribution as the random variable V = £ V h where 
Vz, -. *, V n are independent and 1 ' 

Pt (y t d) = Pr (V f = -0^1 

I Jm 

广 1，?， . •.，《. That L F 2 , … ， V n are independent follows from the 
ract that Z U Z 2 , _ have that property; that is, the numbers 

: ，.… ，行 always appear in a sum W and those numbers receive 
the!r algebraic signs by independent assignment. Thus each of 
h ， p，•. ” 匕 is like one and only one of ZiR { ,Z 2 M 2 , ■.. ， Z n R 『 

Since W and V have the same distribution, the m.g.f ： of Wis that 
of K 











Sec, 11»4| A Test of Wilcoxm 


511 


M{t ) — 






= (|)(e- 6/ + 2/ + 2 + P Z 十 Z) ‘ 

Thus the p,d.f, of W y for /i = 3, is given by 

«. » # I * 

g(w) = M ； = _6, — 4, 一 $ ， 2,4, 6, 

B 

= !， w = o ， 

* r • _ ' * _ _ • ■ ▲ 

= 0, elsewhere* 

The mean and the variance of W are more easily computed 

- -■ n 

directly than by working with the m.g.f. M(t), Because V = Y V { 

n : ， ， I 

and W =Y, hi have the same distribution, they have the same mean 

■ i … 1 

and the same variance. When the hypothesis ff 0 : F{0) = | is true, it is 
easy to determine the values of these two characteristics of the 
distribution of W. Since E{Vi) = 0, 1, 2, • • ” /i，we have 

ii w = E{W) = t E i v i) = ^ 

\ 

■：•. • V - * 

The variance of V f is (_/) 2 (0 + (/) 2 (|) = Thus the variance of W is 

^2 — .交 p. — ” (" + .1)( 私 + 1) ，- ' 

* ^ 轉 F * * r ^ ^ 

For large values of n, the determination of the exact distribution 
of W becomes tedious. Accordingly, one looks for an approximating 
distribution. Although W is distributed as the sum of n random 
variables that are independent, our form of the central limit theorem 
cannot be applied because the/i random variables do not have identical 
distributions’ However, a more general theorem, due to Liapoimov ， 

states that if ", has mean ", and variance aj, i— 1,2, _ ，界 ， if 

t/| ， t/ 2 ,.. • ， C4 are independent, if — 从 | 3 ) is finite for every i, 
and if - ' 


i=\ •' 
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then 


艺 (7, 一丈 /I, 



= 卜 

has a limiting distribution that is l). For our variables 
K !s F 2 , …， we have 

m V, - Kl 3 ) = i 3 (^) + = i\ 

and it is known that 

n 2 (n + I) 2 


I 




4 


Now 


-i^ im 


1)/6 严 


= 0 


because the numerator is of order n A and the denominator is of order 
n 9f2 . Thus 

W 




l)(2n + 1)/6 

is approximately 1) when// 0 is true. This allows us to approximate 

probabilities like Pr (W > c; H 0 ) when the sample size n is large. 

Example 1. Let ^ 0 5 be the median of a symmetric distribution that is of 
the continuous type. To test, with a = 0.01, the hypothesis H 0 : ^ os = 75 
against H { : ^ 0 5 > 75, we observed a random sample of size /i — 18* Let it be 
given that the deviations of these 18 values from 75 are the following numbers: 

1.5, _0,5, 1.6, 0.4, 2.3, 一 0.8, 3.2, 0,9,19, 

0.3, 1,8, 一 0J ， 1.2, Z5,0.6, 一 0,7,1.9,1_3_ 

The experimental value of the Wilcoxon statistic is equal to 

w =ll-4+I2 + 3+15-7+18 + 8 + n + 2+13 — l 

+ 9+ 16 + 5_6 + 14 -|- 10 = 135. 


Since，with n = 18 so that ^/n{n + \)(2n + 1)/6 = 45.92, we have that 


0,01 — Pr 


W 


45.92 


> 2326 = Pr(H^> 106,8). 


Because w = 135 > 106.8, we reject // 0 at the approximate 0.01 significance 
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level. The /?-value associated with 135/45.92 = 2.94 is about I — 0.998 = 
0.002 since $(2,94) = 0.998. 

There are many modifications and generalizations of the Wilcoxon 
statistic. One generalization is the following: Let Cj < c 2 < # * v< be 
nonnegative numbers* Then，in the Wilcoxon statistic, replace the 

ranks 1 ， 2, . -. ， 《 by c 2 ,_ ， c n , respectively. For example, ifn = 3 

and if we have \X 2 \ < \Xj\ < then = 3 is replaced by c 3 , R 2 - I 
by c ]y and i? 3 — 2 by c 2 - In this example，the generalized statistic is 
given by Z t c 3 + Z 2 c x + Z 3 c 2 , Similar to the Wilcoxon statistic, 
this generalized statistic is distributed under as the sum of n 
independent random variables, the ith of which takes each of the values 
c f # 0 and ~c i with probability if c i = 0, that variable takes the 
value — 0 with probability L Some special cases of this statistic 
are proposed in the Exercises. 

EXERCISES 

11,18* The observed values of a random sample of size 10 from a distribution 
that is symmetric about ^； 5 are 10-2, 14*1, 9.2, 113, 7*2,9.8, 6,5, 11.8, 8,7, 
10,8 - Use Wilcoxon's statistic to test the hypothesis H 0 : ^ 0 5 - 8 against 
H { : s > 8 if a = 0.05, Even though n is small, use the normal 
approximation and find the /rvalue. 

• ■ ‘ t * 

11J9. Find the distribution of W^ for n = 4 and « = 5, 

Hint: Multiply the moment-generating function of fV f with n — 3, by 
(e^ 4f + e 4r )/2 to get that of W, with n == 4. 

11,20. Let X u X 2 ^ ^ , X„bG independent- If the pAI. of X i is uniform over 
the interval (—2 卜、 2 卜 ），f = 1 ， 2, 3” …， show that Liapounov’s 

n 

condition is not satisfied* The sum [ Jf, does not have an approximate 

i — 1 

normal distribution because the first random variables in the sum tend to 
dominate it* 

* / 

11,21* lfn = 4 and，in the notation of the text，; 1 ， c 2 = 2, c 3 = c 4 = 3, find 
the distribution of the generalization of the Wilcoxon statistic, say W r For 
a general find the mean and the variance of W g if c f - = i < n/2, and 
c, — [w/2] + 1 ， i > nj2, where [z] is the greatest integer function. Does 
Liapounov’s condition hold here? 

瞥 * ^ 

11.22. A modification of Wiicoxon’s statistic that is frequently used is 
achieved by replacing R { by /f, — 1; that is s use the modification 
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^ ~ LZ/CR — 1)* Show that WJ^Jin — l)n(2n — 1)/6 has a limiting 

■ a i 二 ，- 

distribution that is A^(0, 1)* 

11*23* If，in the discussion of the generalization of the Wilcoxon statistic, we 

let a = = ” ’ = = 1 ， show that we obtain a statistic equivalent to that 

used in the sign test. 

11.24, If c l9 c 2 , - are selected so that i/(n + 1) = J 0 q ^/2/n e~ x2/2 dx, 

= 1 ， 2, . • ■ ，衫 ， the generalized Wilcoxon W z is an example of a normal 

scores statistic. If n ^ 9, compute the mean and the variance of this W r 

11.25. If c, = 2 1 ，/ = 1 ， 2” ■, ， w，the corresponding W g is called the binary 
statistic. Find the mean and the variance of this W g , Is Liapounov’s 
condition satisfied? 

11*26, In the definition of Wilcoxon’s statistic, let be the sum of the ranks 

of those observations of the sample that are positive and let W 2 be the sum 
of the ranks of those observations that are negati ve. Then W fVi - W 2 . 

(a) Show that 2W l - n(n + 1)/2 and W= n(n+ 1)/2™ 2W 2 , 

(b) Compute the mean and the variance of each of W v and W 2 . 

^ — 垂 ^ * 署 . 

11*27* Let Xi , X 2 ^ - - -, be a random sample of size 2n from a 
continuous-type distribution that is symmetric about zero- Modify the 
Wilcoxon statistic by replacing the scores (ranks) I ， 2” _. ， 2n by the scores 
consisting of n ones and n twos，Call this statistic W. 

(a) Find the variance of 

(b) Argue that E(e tW )= 

(c) Evaluate Um E(e tWI ^). What is the limiting distribution of 

- t , * W • 

» *■ % m \ ， • ， 

■f 

11-5 The Equality of Two Distributions 

In Sections 11,3 and 11.4，some tests of hypotheses about one 
distribution were investigated. In this section, as in the next section, 
various tests of the equality of two distributions are studied. By the 
equality of two distributions，we mean that the two distribution 
functions, say F and G, have F(z) = G(z) for all values of 

The first test that we discuss is a natural extension of the chi- 
square test. Let X and Y be independent variables with dis¬ 
tribution functions F(x) and G(y )， respectively. We wish to test the 
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hypothesis that F{z) = G(z), for all z. Let us partition the real line into 
k mutually disjoint sets A u . 

p n = Pr (X e 

* *■ _ 

and 

Pa = (Y^ 

If F{z) — G(z), for all z, thmpn = p i2 ^ i — 1 ， 2, ■ ■ ” fe ， Accordingly, the 
hypothesis that /^z) — G(z), for all z, is replaced by the less restrictive 
hypothesis 

> 

Hq : Pn ~ Pa , / = 1 , 2 ,. . •, L 

m » _ ， - fc: - 

蝽 i - ^ + ^ ^ • * 

■ . ' - • # i ^ … 

But this is exactly the problem of testing the equality of two 
multinomial distributions that was considered in Example 3, Section 
6.6, and the reader is referred to that example for the details, 

Some statisticians prefer a procedure which eliminates some of the 
subjectivity of selecting the partitions. For a fixed positive integer k, 
proceed as follows. Consider a random sample of size m from the 
distribution of X and an independent random sample of size w from 
the distribution of Y, Let the experimental values be denoted by 
A ， x 2 , and yi,y 2 ^ •*:，％• Then combine the two samples into 

one sample of size m + /? and order the m + n values (not their absolute 
values) in ascending order of magnitude. These ordered items are then 
partitioned into k parts in such a way that each part has the same 
number of items. (If the sample sizes are such that this is impossible, 
a partition with approximately the same number of items in each group 
suffices.) In effect ， then，the partition A u A u … ，為 is determined by 
the experimental values themselves. This does not alter the fact that the 
statistic, discussed in Example 3, Section 6.6, has a limiting distribution 
that is f{k — 1), Accordingly, the procedures used in that example may 
be used here, ■ … 

.■: - 〉 T : . ; .1 — ，： V _ 

Among the tests of this type there is one that is frequently used. 

It is essentially a test of the equality of the medians of two distri¬ 
butions, To simplify the discussion, we assume that + 


A k . Define 


/ = 1 ，2，•，也 


1 , 2 ， I • ■ ^ k * 
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the size of the combined sample, is an even number, say m + « = 2A ， 
where A is a positive integer. We take k = 2 and then the combined 
sample of size m + n — 2h^ which has been ordered, is separated into 
two parts, a “lower half" and an “upper half，” each containing 
h = (m + n)jl of the experimental values of X and Y. The statistic ， 
suggested by Example 3, Section 6,6, could be used because it has，when 
H q is true, a limiting distribution that is However，it is more 
interesting to find the exact distribution of another statistic which 
enables us to test the hypothesis /f 0 against the alternative 
: F(z) > G(z) or against the alternative H l : F(z) < G(z) as opposed 
to merely F{z) # G(z). [Here, and in the sequel, alternatives 
F{z) ^ G(z) and F{z) < G(z) and F{z) # G(z) mean that strict 
inequality holds on some set of positive probability measure.] 
This other statistic is V, which is the number of observed values of X 
that are in the lower half of the combined sample. If the observed value 
of Vis quite large, one might suspect that the median of the distribution 
of X is smaller than that of the distribution of K Thus the critical region 

F 

of this test of the hypothesis H 0 : F{z) = G(z)，for all z, against 
H x : > G(z) is of the form V > c. Because our combined sample 

is of even size, there is no unique median of the sample. However, one 
can arbitrarily insert a number between the hth and (h + l)st ordered 
items and call it the median of the sample. On this account, a test of 

■ I * 

the sort just described is called a median test- Incidentally, if the 
alternative hypothesis is H ] : JF{z) < G(z)，the critical region is of the 
form V <c. 


The distribution of V is quite easy to find if the distribution 
functions F(x) and G{y) are of the continuous type and if F\z) = G(z )， 
for all z. We shall now show that V has a hypergeometric p.d.f Let 
m + n = 2h 9 ha, positive integer- To compute Pr (V — v), we need the 
probability that exactly vofX ilf X 2 , …， are in the lower half of the 
ordered combined sample. Under our assumptions, the probability is 


. , 

zero that any two of the 2h random variables are equal. The smallest 


h of the m + n = 2h items can be selected in any one of 



ways* 


Each of these ways has the same probability. Of these ways, 


we need to count the number of those in which exactly v of the 


m values of X (and hence h — v of the n values of Y) 
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appear in the lower h items. But this 



Thus the p,d.f. 


of V is the hypergeometric p.d.f. 


k( v ) = p T (V=, v ) 



n 


v 


= 0， 1 ， 2” • 


m ， 


m + n 
h 


0 elsewhere, 


where m + n = 2h. 


The reader may be momentarily puzzled by the meaning of. 


n 


h 


v 


for £； = 0, 1 ， 2, . w. For example，let m = 17, /? = 3, so that h = 10, 


Then we have 


v 


0, 1, • -. ， 17. However, we take 


n 


h 


v 


10 — v 

to be zero if A — t; is negative or if h — v > n. 

If m + n is an odd number, say m + /i = 2 办 + 1， it is left to the 
reader to show that the p.df. k(v) gives the probability that exactly v 
of the m values of X are among the lower h of the combined 2h + 1 
values; that is, exactly v of the m values of X are less than the median 
of the combined sample. 

If the distribution functions F(x) and G(y) are of the continuous 
type，there is another rather simple test of the hypothesis that 
F (^) = G(z)Jox edlz. This test is based upon the notion of runs of values 
of X and of values of Y. We shall now explain what we mean by runs. 
Let us again combine the sample of m values of X and the sample of 
n values of Y into one collection of m + n ordered items arranged in 
ascending order of magnitude. With m = 7 and ^ 8 we might find 
that the 15 ordered items were in the arrangement 


x yyy xx y xyy xxx yy 

Note that in this ordering we have underscored the groups of succes¬ 
sive values of the random variable X and those of the random variable 
K If we read from left to right, we would say that we have a run of 
one value of X y followed by a run of three values of Y, followed by 
a run of. two values of X, and so on* In our example, there is a 
total of eight runs. Three are runs of length 1; three are runs of 
length 2; and two are runs of length 3. Note that the total number of 
runs is always one more than the number of unlike adjacent symbols. 
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Of what can runs be suggestive? Suppose that with m — 7 and 
« = 8 we have the following ordering: 

xxxxx y u yyyyyyy^ 

To us, this strongly suggests that F\z) > G(z). For if, in fact ， 
F{z) = G(z) for all z, we would anticipate a greater number of runs. And 
if the first run of five values of X were interchanged with the last run 
of seven values of Y, this would suggest that F{z) < G(z). But runs can 
be suggestive of other things. For example, with m — 7 and /i = 8, 
consider the runs. 


yyyy xxxxxxxyyyy m 

This suggests to us that the medians of the distributions of X and Y 
may very well be about the same, but that the “spread” (measured 
possibly by the standard deviation) of the distribution of X is 
.considerably less than that of the distribution of Y. 

Let the random variable R equal the number of runs in the 
combined sample, once the combined sample has been ordered. 
Because our random variables X and Y are of the continuous type, we 
may assume that no two of these sample items are equal. We wish to 
find the p.df. of jR/To find this distribution, when F{z) = G(z% we shall 
suppose that all arrangements of the m values of X and the n values 
of Y have equal probabilities. We shall show that 


Pr (R = 2k+l)^ 




Pr (R = 2k) 




⑴ 


when 2k and 2k + I are elements of the space of 

To prove formulas (1), note that we can select the m positions for 

m + 矜 


them values of X from them + n positions in any one of 


m 


ways. 


_ ■ a ■■ X f 

Since each of these choices yields one arrangement, the probability of 

n 


each arrangement is equal to I 



The problem is now to 


determine how many of these arrangements yield R — r, where r 
is an integer in the space of R. First，let t = 2k + \ y where k is a 
positive integer* This means that there must be /c + 1 runs of the 
ordered values of X and runs of the ordered values of Y or vice versa. 
Consider first the number of ways of obtaining k + 1 runs of 
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the m values of X. We can form k + \ of these runs by inserting k 
“dividers” into the m — 1 spaces between the values of X, with no more 
than one divider per space. This can be done in any one of 

/M — 1 、 ： 

k j ways. Similarly，we can construct k runs of the^z values of Y 

by inserting k — 1 dividers into the n — \ spaces between the values of 
K，with no more than one divider per space* This can be done in any 
n — 


one of 


k 



of I 


m 


k 


n 

k 


ways. The joint operation can be performed in any one 


ways. These two sets of runs can be placed together 


"' ► 4 ? 

to form r ^ 2k + I runs. But we could also have k runs of the values 
of X and k + 1 runs of the values of Y\ An argument similar to the 
preceding shows that this can be affected in any one of 


m 

k 


ways. Thus 



n 


k 


V -* - 


Pr(R = 2k+ I) 




m 


n 


1 


k-\ 


k 




m + n 
m 


which is the first of formulas (1), 

Hr — 2k, where A: is a positive integer，we see that the ordered values 
of X and the ordered values of Y must each be separated into k runs, 

These operations can be performed in any ope of (: ; ) and |) 

ways ， respectively- These two sets of runs can be placed together to 
form r = 2k runs. But we may begin with either a run of values of X 
or a nm of values of Y. Accordingly, the probability of 2k runs is 


2 


m 

k 


Pr (R = 2k) 




m + n 

■ 

m 


which is the second of formulas (1), ， ^ 「 

If the critical region of this run test of the hypothesis 
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H 0 : F(z) — G(z) for all z is of the form R < c, it is easy to compute 
a = Pr(R < c; H g ), provided that m and n are small Although it is not 
easy to show，the distribution of R can be approximated, with large 
sample sizes m and n, by a normal distribution with mean 


^ - E(R) - 2 


mn 


m + n 


and variance 

3 _ ("— 1)(/j — 2) 
a ~ m + n — l ’ 

The run test may also be used to test for randomness. That is, it can 
be used as a check to see if it is reasonable to treat 不， X 2 , … ，足 as 
a random sample of size s from some continuous distribution. To 
facilitate the discussion, take 丨 to be even. We are given the s values 
of X to be X ， ， x 2 ，…， which are not ordered by magnitude but by 
the order in which they were observed. However，there are sjl of these 
values, each of which is smaller than the remaining s/2 values. Thus we 
have a “lower hair’ and an “upper half” of these values. In the 
sequence x 卜 x 2 ” x s ^ replace each value X that is in the lower half 
by the letter L and each value in the upper half by the letter U. Then, 
for example, with s — 10, a sequence such as 

LLLLULUUUU 

may suggest a trend toward increasing values of X; that is, these 
values of X may not reasonably be looked upon as being the 
observations of a random sample. If trend is the only alternative to 
randomness, we can make a test based upon R and reject the hypothesis 
of randomness if R< c. To make this test, we would use the p,d.f. of 
R with m = n = s/2. On the other hand if, with s — 10, we find a 
sequence such as 


LVLULU LULU, 

our suspicions are aroused that there may be a nonrandom effect which 
is cyclic even though R — 10, Accordingly, to test for a trend or a cyclic 
effect，we could use a critical region of the form R < c x or R > c 2 . 

If the sample size ^ is odd，the number of sample items in the "upper 
half 1 ’ and the number in the ‘‘lower half” will differ by one. Then ， 
for example，we could use the p.d,f, of R with m = (s — 1)/2 and 
n ^ (s + 1)/2, or vice versa ， • 
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EXERCISES 


1128. Let 3,1 ， 5-6, 4.7, 3-8, 42, 3,0, 5.1 ， 3.9, 4.8 and 5*3, 4,0, 4.9, 6.2, 3J, 
5*0, 6*5, 4.5, 5*5, 5.9, 4*4, 5.8 be observed independent samples of sizes 
m — 9 and n = 12 from two distributions. With 灸 = 3, use a chi-square test 
to test, with a = 0.05 approximately 5 the equality of the two distributions, 

11,29, In the median test, with m = 9 and n = 7, find the p.df, of the random 
variable V, the number of values of X in the lower half of the combined 
sample. In particular, what are the values of the probabilities Pr ( F — 0) and 
Pr(K=9)? 


11.30. In the notation of the text, use the median test and the data given in 
Exercise 11.28 to test, with a = 0.05, approximately, the hypothesis of the 
equality of the two distributions against the alternative hypothesis that 
F{z) > G{z). If the exact probabilities are too difficult to determine for 
m — 9 and n = 12 ? approximate these probabilities, 

11-31. Using the notation of this section, let U be the number of observed 
values of X in the smallest d items of the combined sample ofm + n items. 
Argue that 



Pr (t/ = w) — 



0, • ■, ， m. 


The statistic U could be used to test the equality of the {100/?)tfa percentiles, 
where (m + n)p - d, of the distributions of X and 7, 

11.32. In the discussion of the run test，let the random variables R x and R 2 
be ， respectively, the number of runs of the values of X and the number of 
mns of the values of Y. Then R ^ R } + R 2 . Let the pair (r u r 2 ) of integers 
be in the space of (R ly R 2 ); then \r } - r 2 \ < L Show that the joint p.d.f* of 

l \ U m + n 


and R 2 is 2 


m 

1 



n 

r% 


m + n 



m 


if r, = r 2 ; that this joint p.d.f is 


if \r } — r 2 \ — 1; and is zero elsewhere. Show 


that the marginal p.d.f, of 7?, is 


r i _ 1 J\ r 2 一 i Jl \ m 

/ n — A /n + i \ if m n 
^-l)\ r } )[{ m 

and is zero elsewhere* Find E(R } ). In a similar manner, find E(R 2 ). Compute 
E(R) - EiR,) + E(R 2 l 




11.6 The Mann-Whitney-Wilcoxon Test 

■ • • * # t \ \ 

■ 遍•華 ^ * 

■i 和 ^ * 

We return to the problem of testing the equality of two distributions 
of the continuous type. Let X and Ybc independent random variables 
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of the continuous type. Let F{x) and G{y) denote, respect¬ 
ively, the distribution functions of X and Y and let X U X 2 , : ,X m 
and Y u F 2 , …，， K denote independent samples from these distri¬ 
butions, We shall discuss the Mann-Whitney-Wilcoxon test of the 
hypothesis H 0 : F(z) — G(z) for all values of z. 

Let us define 


z" = 1, Xf < Yj 7 

-0, Yj, 


and consider the statistic 


We note that 

‘ * 




i *»■ \ ^ 

counts the number of values of X that are less than Y h j = I ， 2,… ， /i. 
Thus U is the sum of these n counts. For example, with m — 4 and 
n — 3, consider the observations 

< 乃 < x, < x 4 <y L <x y <y 2 ^ 1 


There are three values of x that are less than y { ; there are four values 
of x that are less than y 2 ; and there is one value of x that is less than 
y y . Thus the experimental value oft/isw = 3 + 4 + l — 8, 

Clearly, the smallest value which U can take is zero, and the largest 
value is mn. Thus the space of U is {w : = 0, 1 ， 2, . • • ， mn). If U is 

large, the values of Y tend to be larger than the values of X f and this 
suggests that F(z) > G(z) for aH z. On the other hand, a small value of 
U suggests that i^(z) < G(z) for all z. Thus, if we test the hypothesis 
H 0 : F(z) = G(z) for all 2 against the alternative hypothesis 
: F{z) > G(z) for all z, the critical region is of the form U > 
the alternative hypothesis is i/j : F(z) < G(z) for all z s the critical region 
is of the form U < c 2 .To determine the size of a critical region，we need 
the distribution of U when H 0 is true. 


If w belongs to the space of (/, let us denote Pr(U = w) by the symbol 

h(u; m, n). This notation focuses attention on the sample sizes m 

* ** A 

and n. To determine the probability h(u;m,n), we first note that 
we have m + n positions to be filled by m values of X and n values of 

Y. We can fill m positions with the values of X in any one of {^ 1 ^ H 

ways. Once this has been done, the remaining n positions can be 
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filled with the values of Y. When H 0 is true, each of these arrangements 


has the same probability, 1 


m 


n 


m 


I，The final right-hand position of 


an arrangement may be either a value of A" or a value of Y. This position 
can be filled in any one of m + « ways, m of which are favorable to X 
and n of which are favorable to Y. Accordingly, the probability that 
an arrangement ends with a value of Xis mf(m + n) and the probability 
that an arrangement terminates with a value of Y is nj{m + n). 

Now U can equal u in two mutually exclusive and exhaustive ways: 
(1) The final right-hand position (the largest of the m + « values) in the 
arrangement may be a value of X and the remaining (m — 1) values of 
X and the n values of Y can be arranged so as to have U = u. The 
probability that U = u y given an arrangement that terminates withra 
value of X f is given by h(u; m — 1 ， /i). Or (2) the largest value in the 
arrangement can be a value of Y. This value of Y is greater than m 
values of X, If we are to have U ~ u, the sum of n — 1 counts of the 
m values of X with respect to the remaining n — \ values of Y must be 
u — m. Thus the probability that U = u ，given an arrangement that 
terminates in a value of Y, is given by h(u — 1). Accordingly, 

the probability that U - 


u is 


h(u; m 7 n) 


m 


m + n 


\h(u; m - l,n) + 


n 


m + n 


\h{u — m; m，/I — 1), 


We impose the following reasonable restrictions upon the function 
h(u; m, n): 

h{u\ 0, n) — 1, u = 0 f 




0 ， w > 0 ， n > I, 


and 


h(u; m ， 0) = 1 ， w = 0, 




0 ， w > 0, m> 1, 


and 


h(u; m，/?) = 0 ， w < 0 ， m > 0, n>0. 


Then it is easy, for small values m and n, to compute these proba¬ 
bilities* For example, if m = n = 1， we have 

% *■ ^ 'd I 

h(Q; 1,!) = 0(0; 0, 1) + 0( — 1; 1 ， 0) = [ 1 +1 • 0 - 
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机 u)= ㈣ i; 0, i) + ^(0 ； 1 ， 0) 外 o +1 • i 

and if m = 1 ， 《 = 2, we have 

m 1,2) - }A(0; 0,2) + \h{- l;l,l)-|l+f-0-i, 

h{\ ； U2) = }A(1 ； 0,2) + |A(0; 1,1) =}-04-f-i =|, 

K2\ 1,2) = |A(2; 0, 2) + |A(1 ； 1,1) = |. 0 + f 

In Exercise IL33 the reader is to determine the distribution of U when 
w — 2, n = l; m — 2 y n — 2; m — l, n ^ 3; and m = 3, w = 1, 

For large values of m and n, it is desirable to use an approximate 
distribution of U. Consider the mean and the variance of U when 
the hypothesis H 0 : /(z) = G(z), for all values of z, is true. Since 

"=E £ then 

y= I /=I 

E(U)=tt 

i d l 

But 

E(Zu) = (1) Pr (X t < Yj) + (0) Pr (X i > Yj) = { 
because, when H 0 is true, Pr (X f < Yj) = Pr {X t > Yj) = i. Thus 

， Em-1 1 ( 5 )=^- 

i = I j—\ \^/ ^ 

To compute the variance of U, we first find 

Em - i i i z 

^ = l A = I j 费 \ i — \ 

=t |£(Zj) + i % I Eiz^z,) 

y =s ] i — l + k — \ j ~ \ i ~ \ 

k 參 j 

n fn m n tt m m 

+ E I E e{z,z^ + £ I S E E{z u z hk )- 

y = ii ft = I / s* 1 k = l j ^ \ 

h 尹 f k 幸 j A # i 

Note that there are mn terms in the first of these sums, mn(n — 1) 
in the second, mn(m — 1) in the third, and mn{m — l)(n — I) in the 
fourth. When H 0 is true, we know that X h X h , and Y k , i # hj # k, 
are independent and have the same distribution of the continuous type. 
Thus Pr {X, < Yj) = Moreover, Pr (X f < X t < Y k ) = | because 
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this is the probability that a designated one of three items is less than 
each of the other two. Similarly, Pr (X f < Y p X h < Yj) = f Finally, 
Pr {Xi < Y }> X k < Y k ) ^ Pr (X f < Yj) Pr {X h < Y k ) ^ Hence we have 


£(4) - (l) 2 Pr {X i <Y j ) = \, 

E^ZjjZjk) = (1)(1) Pr (Xj < Yj, Xj < Yj^ = j ^ 

E(ZjjZ h j) = (1)(1) Pr (Xj < Yp X h < Yj) = j, / # h, 

and 

E{ZijZ hk ) — (1)(1) Pr (Xj < Y Jt X h < Y k ) ― i ^ h, j ^ k. 
Thus 




mn(m — l)(n — 1) 

~ 4 


and 


4 


mn 


n 


m 


2 


_ 


(w — l)(/i — 1) 

4 ^ 


mn 


mn(m + /i + !) 


Although it is fairly difficult to prove, it is true, when F(z) = G(z) for 
all z, that 


mn 

T 


mn{m + n + 1) 


has，if each of m and n is large, an approximate distribution that is 
#(0,】）■ This fact enables us to compute, approximately, various 
significance levels. • 

Prior to the introduction of the statistic U in the statistical literature, 
it had been suggested that a test of H 0 : F(z) - G(z), for all 2 , be based 
upon the following statistic, say『(not Student's 0* Let T be the sum 
of the ranks of Y u F 2 , •. ” ^ among the m + /i items $， •.. ， JT m ， 
s once this combiried sample has been ordered. In Exercise 
11.35 the reader is asked to show that 

^ , ■ ,T r, ^ J ^ ''. ., 【 V ’: _ ■ . 

1 ” ^ n(n+l) 

U ^ 1 - ^ - ■ 

‘ ： '2 … _■ ■ 、； 
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This formula provides another method of computing V and it shows 
that a test of H 0 based on U is equivalent to a test based on T. A 
generalization of T is considered in Section ! L8* 

m * ■ 

Example I* With the assumptions and the notation of this section，let 
m — 10 and ^ = 9, Let the observed values of X be as given in the first row 

and the observed values of Y as in the second row of the following display: 

, * • ‘ ' … * : 

43, 5,9, 4_9, 3 J ， 13, 6A 6,2, 3.8, 7.5, 5.8 ， 

5.5, 7.9, 6*8, 9.0, 5.6, 6.3, 8.5, 4.6, 7.1. 

*■ •• , - . 

Since, in the combined sample, the ranks of the values of y are 4, 7, 8,12, 14, 
15,17,18,19, we have the experimental value of T to be equal to t = 114. Thus 
u — 114 — 45 = 69. If F(z) — G(z) for al! z, then, approximately, 

- Pr((7 > 65J46), 

Accordingly, at the 0.05 significance level, we reject the hypothesis H Q : F\z) = 

■ 墨 # ■* ■ _ 

G(z), for all 2 , and accept the alternative hypothesis H } : F{z) > G(z )， for 
all z. 


(K05 = Pr 


17-45 

12.247 


> 1,645 


EXERCISES 

11*33. Compute the distribution of t/in each of the following cases: (a) m = 2, 
n = 1; (b) m — 2 % n — 2\ (c) m — 1, w = 3; (d) m = 3, « — I* 

11.34, Suppose that the hypothesis H 0 : F(z) — G(z), for all 2 , is not true* Let 
p — Pr (Xi < Yj). Show that Ujmn is an unbiased estimator of p and that 
it converges in probability to p as m 一 00 and n^ oo. 

11.35. Show that U — T — [n(n + l)]/2. 

» * ■ 

Hint: Let Y 0) < Y (2) < ' < Y {n) be the order statistics of the random 

sample Y u 匕， Y n . If is the rank of Y {i) in the combined ordered 
sample, note that F ⑺ is greater than /?/ — ( values of X •: 

« ， #；_ 4 

d ■ 

_ K 響 ■ f 4 • 

1136, In Example 1 of this section assume that the values came from two 
normal distributions with means //, and fi 2 , respectively, and with common 
variance c 2 . Calculate the Student’s / which is used to test the hypothesis 
H 0 : fiy = ^ 2 - If the alternative hypothesis is H { : pL x < do we accept or 
reject H 0 at the 0 05 significance level?, 
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11.7 Distributions Under Alternative Hypotheses 

In this section we discuss certain problems that are related to a 
nonparametric test when the hypothesis H 0 is not true. Let I and Y 
be independent random variables of the continuoiis type with 
distribution functions F(x) and G(y)^ respectively, and probability 
density functions/(x) and g(yl Let 不， Z 2 , • ■. ， and Y u Y 2 , … ， Y n 
denote independent random samples from these. distributions. 
Consider the hypothesis H 0 : F(z) = G(z) for all values of z. It has been 
seen that the test of this hypothesis may be based upon the statistic 
which，when the hypothesis is true, has a distribution that does not 
depend upon F(z) = G(z), Or this test can be based upon the statistic 
r = U + n{n + 1)/2, where T is the sum of the ranks of Y v , JT 2 ,^ 
in the combined sample* To elicit some information about the 
distribution of T when the alternative hypothesis is true, let us consider 
the joint distribution of the ranks of these values of Y. 

Let < Y {2 ) < m ^ < Y (n) be the order statistics of the sample 
匕 ，… ， IV Order the combined sample，and let R t be the rank of 
(0, / = 1 ， 2, ■.. ， /I， Thus there are f — 1 values of Y and — i values 
of X that are less than Y (tV Moreover, there are — 1 values 

of x between Y {i ^, } and r w /lf it is given that K 0) = y, < Y il} = 
少 2 < … < y {ri ) = y n , then the conditional probability 

Pr (ifj = R 2 = r 2s …， /?„ = 〜|乃 < 少 2 < … < h), (1) 

where ri < r 2 < ^ < m + n are positive integers, can be 

computed by using the multinomial p dX in the following manner. 
Define the following sets: A } ^ {x: -oo <x<y } ), {jc : 乃 _ ■ < 

x < 乃 }， ( = 2,… ， / 2 ， A n + l = {x:y n <x < oo}. The conditional 
probabilities of these sets are ， respectively, p } = F(v } ) p 7 = 

办） - 办 )，…， M 取 ） n t = 1 ^) Then 2 the 

conditional probability of display (I) is given by 

■ 

* * 合- * 

ml ~ l p2 2 - G - 1 “ • p? -^-1- \ptn^n-r n 

(r! - u! (r 2 - r, - 1 )! t .兔 — L — 1)! (m + « - rjl ' 

，: ■ ■ 广 - … 

■ P ■ 

^ i e- $ * % ■ 

_ ■ a 6 ■ • ! ■ ■、去 . * 

To find the unconditional probability Pr (R } r t ， 及 2 =〜“ • ， 

= r nX which we denote simply by Pr (r u v. r^i we multiply the 
conditional probability by the joint p.d.f ： of Y {[) < Y( 2) < v- • ⑻， 
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namely n\ g(y l )g(y 2 )- 
That is ， 

r*oo 

Pr(r, ， r 2 , …， = 


g(y n \ and then integrate on y {i y 2 ,... 





rn 



Pr (r l5 


■ i 


， 


< … 


< y n )n\ 


^ g{y\)-'g(yn)dy x •… 办打， 

<# 

where Pr (r 】 ，… ., r n \y x < … < y n ) denotes the conditional probability 
in display (!)• 

Now that we have the joint distribution of R u jR 2 , …，， /^，we can 
find ， theoretically, the distributions of functions of 及，及 2 ,…， R n and ， 

n 

in particular, the distribution of T = ^ R h From the latter we can find 

I 

that of V = T — n(n + 1)/2. To point out the extremely tedious 
computational problems of distribution theory that we encounter, we 
give an example. In this example we use the assumptions of this section. 

Example L Suppose that an hypothesis 丑 。 is not true but that in fact 
: 1,0 < a ： < I ? zero elsewhere, and g(y) — 2y^<y< I, zero elsewhere. 
Let m = 3 and n = 2. Note that the space of U is the set {u: u = 0, I ， .. ， ， 6}. 
Consider Pr (U = 5). This event U = 5 occurs when and only when = 3, 
R 2 — 5, since in this section < R 2 are the ranks of Y in < Y {1) in the combined 

sample and U = + R 2 — 3. Because F(x) — x^O < x < I, we have 


Pr (C/ = 5) = Pr (R { = 3, R 2 = 5) 

C l r v ^3! v?(v 7 - y,) 

= 2 ! ( 2 ^ 0 ( 2 ^)^! dy 2 



Consider next Pr (U = 4) - The event U —4 occurs if & = 2, 及 2 = 5 or if 
Ri — 3, i ?2 = 4, Thus 

Pr (t/ = 4) = Pr (R t - 2, R 2 = 5) + Pr (R x = 3, R 2 - 4); 

the computation of each of these probabilities is similar to that of 
Pr (i?i = 3, /f 2 = 5). This procedure may be cootimied until we have computed 
Pr (U ― w) for each m e {w: « = 0, 1， •，•, 6}* 

In the preceding example the probability density functions and the 
sample sizes m and n were selected so as to provide relatively simple 
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integrations* The reader can discover for himself or herself how 
tedious, and even difficult, the computations become if the sample sizes 

are iarge or if the probability density functions are not of a simple 
functional form. 

EXERCISES 


1137* Let the probability density functions of X and Y be those given in 
Example 1 of this section. Further, let the sample sizes be m = 5 and n = 3, 

If R x < R 2 < are the ranks of Y (]) < Y {2) < Y {3} in the combined sample ， 
compute Pr = 2, 及 2 = 6, = 8). 

11 ， 38* Let X 2 , • •. ， H>e a random sample of size m from a distribution 
of the continuous type with distribution function F(x) and p,<Lf. 

’(x) =/(x). Let Y u A，. ， • ， be a random sample from a distribution 
with distribution function G{y)^ [F(y)] e , 0 < (?• If 0 # I ， this distribution 
is called a Lehmann alternative. With 0 = 2, show that 

2 ff r 〖 (r 2 + l)(r 3 + 2) … (r ff + a — 1) 
j(m + n+ \)(m + « + 2) ■ * • (m + 2«) 

1139* Let be a random sample from a continuous-type distribution 

with distribution function F(x) and p.d.f f(x) - F(x). Let Y u Y 2 be a 
random sample of size n — 2 from a distribution with distribution function 
G iy) ^ IHy)] 2 ^ In the combined sample of 5, determine the probability 
that the Y values have ranks 1 and 3; that is，the order isyxyxx. 


Pr(r! ， r 2 , • • ■，〜 ） 


m 


m 


11.40. To generalize the results of Exercise 1138, let G(y) = h[F(y)l where 
A(z) is a differentiable function such that h(0) = 0, h(l) = L and h f (z) > 0, 
0 < z < !, Show that 


(广1，广2，•，厂 fl ) 


m{v rx )h\v r2 )^^h\v rn )] 


m + n 
m 


where V { < < " . < V m+n are the order statistics of a random sample 

of size m + n from the uniform distribution over the interval (0, 1), 


11.8 Linear Rank Statistics 

In this section we consider a type of distribution-free statistic that 
is，among other things，a generalization of the Mann-Whitney- 
Wilcoxon statistic. Let V u V 2i be a random sample of 

size TV from a distribution of the continuous type. Let ^ be the 
rank of V f among K 2l … ， V N ，/ = 1, 2” • ” AT; and let e(/) 
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be a scoring function defined on the first N positive integers—that is, 
let c(l), c(2) ， … ， c(N) be some appropriately selected constants.. If 
a!，. ；. ， are constants, then a statistic of the form 

L = t 叫⑻ 

f 

is called a linear rank statistic. 

To see that this type of statistic is actually a generalization of both 
the Mann-Whitney—Wilcoxon statistic and also that statistic 

associated with the median test ， lot N = m + n and 

■ '»■ -• ? 

f^i — X\^ •, •, V m — V m ^ i = ^ Vfq = Y n . 

These two special statistics result from the following respective assign- 
ments for c(0 and a u a 2y a N : 

L Take c(i) = i ，4 … =a m = 0 and 〜 + j = ■"= 如 = 1， so that 

N m + n 

； [=X ( 及 ,) 二 2 Rh 

■ 1 ' 隹 i i=m + I \ 

which is the sum of the ranks of Y u F 2 , * • ，， l among the m + n 
observations (a statistic denoted by T in Section 11.6). 

2, Take c(i) = 1, provided that / < (m + n)/2, zero otherwise. If 

岣 ==1 and a m + l — — m = a N ^ 0 , then 

r t _ * 

a . ^ ^ ■ * 

L = t = t 砂 ,)， 

l ， •: /-I 

which is equal to the number of the m values of X that are in the 
lower half of the combined sample of m + r observations (a statistic 
used in the median test of Section 11*5). 

To determine the mean and the variance of L, we make some 
observations about the joint and marginal distributions of the ranks 
R u R 2j … ， R n . Clearly，from the results of Section 4,6 on the distri¬ 
bution of order statistics of a random sample, we observe that each 
permutation of the ranks has the same probability ， 

、 „ , 1 

Pr (/?j = r u R 2 土 … ，為 =/W = 兩， 

* 

where r 】， r 2 ” •. ， is any permutation of the first N positive 
integers* This implies that the marginal p.d.f. of Rj is 

giird = ^ ^ r 产 …， N ， 
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zero elsewhere, because the number of permutations in which i?, = r i 
is (AT - 1)! so that 


SZ … Z — 


(N~ I)! 

~ m ~ 


N 


In a similar maimer, the joint marginal p.d.f of 及 , and Rj, i # j\ is 

i 


gij{r h rj) = 


n ^ Oi 


N(N- 1) ? 

• b * 

zero elsewhere. That is, the (n — 2) - fold summation 

(N ~ 2)1 I 


2： … Z 


m 


m 


N(N-ty 


where the summation is over all permutations in which R t = r i and 

Rj — 

Among other things these properties of the distribution of R ]y 
i? 2 , … ，及 " imply that 


N 


£*[c(i?,-)] = ^ c(r ,) 


n 





c(l) + * * - + c 、 N) 


N 


If，for convenience, we let c(k) c kf then 


N 


mm = i 

k 亡 I 



C, 


say, for all / = 1 ， 2, ■,. ，见 In addition, we have that 

a. ■# 

(c k -- cf 




n 


©- 1 


j 


for all / = 1 ， 2, 
A si 
a little 


simple expression for the covariance of c(R f ) and c(Rj), i ^ I is 
! more difficult to determine. Th 


That covariance is 


- W ) - C]} =11 ( 气 Cl 

k^h /v(yv ― I ； 


However, since 


0 


N 


X ( C * — f) 


2 


N 


Z ( C k — ^) 2 + E H ( C k - c)(c h — C% 

^ ^ I k 麥 h 
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the covariance can be written simply as 


五 {園 一 cMRj) - ^1} 




令 (Ck - 

N(N - I) 


With these results, we first observe that the mean of L is 


t^L 


E 


N 


E “A 氏 ) 


N 


I 


N 


HM - I - Ndc, 


where a = (£ a)jN. Second, note that the variance of L is 


N 


o 2 L = L 咖 2 _、 + ZE ^jE{[c{Rt) - c][c(/?y) - c]} 

i =1 




N N ( c , 一印 

r = I A - I iV i 李 } 


♦ (c k - cf 

i ^ \ N{N — 1) 


■ 令 (c k - c) 2 

; 】爾 - 1) 


N 


(N-i)Y, a f _EX a ‘ a J 


However, we can determine a substitute for the second factor by 
observing that 


N 


N Y, ^) 2 = N X — N 2 d 2 








N 


2 


^ - E ^ 


iV 


£ «? +1Z 

1 i^j 






(N-i) x 叫 

/= \ i 幸 i 


So, making this substitution in a 2 L ^ we finally have that 


(c k - c ) 2 

& N(N - l) 


N 


^ L ( a i ~ ^) 1 


N 


X (a f - 句 2 [ (G - 巧 2 . 

=；I k = I 


N—\ t 


In the special case in which N = m + n and 

I = Z c ( 尺)， 

/ = m + I 
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the reader is asked to show that (Exercise 1141) 


— nc, ' {c k - cf 



A further simplification when c k = c(k )= 

n(m + n + l) / 


「i = 


=k yields 
mn(m + ^ + I) 


f^t — 


these latter are ， respectively，the mean and the variance of the statistic 
T as defined in Section 11.6. 

As in the case of the Mann-Whitney-Wilcoxon statistic, the 
determination of the exact distribution of a linear rank statistic L 
can be very difficult. However, for many selections of the constants 
， A ， …， a# and the scores c(I )， c(2) 7 •… ， c(N), the ratio (L — n L )jo L 
has，for large N, an approximate distribution that is N(0, 1), This 
approximation is better if the scores c(k) — c k are like an ideal sample 
from a normal distribution, in particular, symmetric and without 
extreme values* For example, use of normal scores defined by 



makes the approximation better. However, even with the use of ranks, 
c(k) — the approximation is reasonably good, provided that is 
large enough, say around 30 or greater- 

In addition to being a generalization of statistics such as those of 
Mann ， Whitney, and Wilcoxon, we give two additional applications of 
linear rank statistics in the following illustrations. 

Example 1. Let X l% X 2 ^ … denote n random variables. However, 
suppose that we question whether they are observations of a random sample 
due either to possible lack of independence or to the fact that X u X 2 ^, . , 、 x n 
might not have the same distributions. In particular, say we suspect a trend 
toward larger and larger values in the sequence X u Jf 2j _., X n . If 
J^i = rank ( 不 )， a statistic that could be used to test the alternative (trend) 


hypothesis is L = ^ iR h Under the assumption (H 0 ) that the n random 

i ■ 

variables are actually observations of a random sample from a distribution 
of the continuous type, the reader is asked to show that (Exercise 11.42) 


f^L = 


n(n + 1) 
4 


a\ — 


n\n + l) 2 (n - 1) 
144 一 
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The critical region of the test is of the form L>d, and the constant dean be 
determined either by using the normal approximation or referring to a 
tabulated distribution of L so that Pr (L > d; H 0 ) is approximately equal to 
a desired significance level a. 

4 » I • 

Example 2. Let (X u (^ 2 , K)， •■” d, D be a random sample from 
a bivariate distribution of the continuous type* Let if,- be the rank of 
X， among X u X 2lf X n and Q t be the rank of Yi among F 2 , •. •， 

If and Y have a large positive correlation coefficient, we would anticipate 
that Ri and Q, would tend to be large or small together. In particular, 
the correlation coeflicient of (R u (R 2 , Q 2 \ . _, (R n , Q n ) f namely the 
Spearman rank correlation coefficient, 

I (R f - RXQ, ^ Q) 

/= 1 ， 


JK^-RfZiQi-Q) 2 

V I ，■戸 i 

—A 1 

* B M. * M i- 

would tend to be large. Since 仏， R 2 , … ， R n and Qi,Q 2 y ^ , Qn are 
permutations of 1 ， 2, .. • ， / 1 ， this correlation coefficient can be shown 
(Exercise 11.43) to equal 

« - . « * ，警基 

t 秘—咖 + i) 2 /4 

.. * * 1 — 1 

n(n 2 - 1)/12 ， 

which in turn equals 

6 i (k - a) 2 

— 1 卜 I 

n(n 2 — 1) 

From the first of these two additional expressions for Spearman's statistic, it 

， it 

is clear that £ 尺 0, is an equivalent statistic for the purpose of testing the 

1 =^ \ ， 、 ■ 

independence of X and Y, say However，note that if H Q is true, then the 

m* • m n a 

distribution of 冗 d which is not a linear rank statistic, and L = ^ iR t 

are the same. The reason for this is that the ranks R { , R 2j . ■ •，足 and the ranks 
Q\^ Qi^ ^Q n are independent because of the independence of X and Y, 
Hence, under H 0 , pairing 氏 ， &, •" ，足 at random with 1 ， 2, " . ， 《 is 
distributionally equivalent to pairing those ranks with0 u Q 2 , ^ which 
is simply a permutation of I ， 2, . + • ， w. The mean and the variance of L is 
given in Example I, 
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EXERCISES ^ 

*，，-••，* .. . • • ■ . 

11*41* Use the notation of this section. 

Y ■ • N 

(a) Show that the mean and the variance of L = ^ c(Ri) are equal to 

the expressions in the text. /=/M+ 1 

(b) In the special case in which L = ^ R t> show that fi L and are 

f — l , 

d " 

those of T considered in Section 1 ] ,6. 

Hint: Recall that . v 


Z k 2 = 



N(N+ l)(2N + I) 
6 



11,42* If X l$ X l7 … ^ is a random sample from a distribution of the 
continuous type and if if, = rank (JQ，show that the mean and the variance 
of L ^ E iR f are n(n 4 - l) 2 /4 and n 2 (n + l) 2 (n — 1)/144, respectively, 

11-43. Verify that the two additional expressions, given in Example 2, for the 
Spearman rank correlation coefficient are equivalent to the first one. 

► 5 . k 

Hint: + I)(2«+ 1)/6 and S(^ - Q^/2 - Z (Rj + Qf)f2 — 

2： 7? 必. - ^ \；' 

> ■ • ，， - j 

11^44. Let X lTP X ly …， be a random sample of size n = 6 from a 
distribution of the continuous type. Let = rank (Xi) and take a } — a 6 — 9 S 

* • 6 
a 2 — a s = 4, aj — a 4 = L Find the mean and the variance of £ = ^ a,if ,， 

‘ . i — t 

a statistic that could be used to detect a parabolic trend in X t , X 2 ^ … ， X h . 

11*45. Let Ri be the rank of X h f = 1 ， 2, ， " ， 9. The statistic 
W ^ R 2 + R 3 ) + 2(7?4 + R s - {- R^) + 3(jR 7 + R e + R 9 ) is used to test 
a trend in the data. If, in fact ， ％， X 2 , … ，） C 9 are observations of a random 
sample from a continuous-type distribution，what are the mean and the 
variance of Wl 


11,46* Let X 2 , X 3y X 4 ^ X s be a random sample of size n — 5 from a 

continuous-type distribution- Let jR f be the rank of X h / = 1, 2, 3, 4, 5 - 

★ ■ - * - - J : . 

(a) Compute the mean and the variance of L — R 5 — R x , 

(b) Find the distribution of L. _ 

11.47 - In the notation of this section show that the covariance of the two 

N N 

linear rank statistics, L* = ^ and L 2 = [ ( 凡 )，is equal to 

* - .. / sa ] > ♦ f = I < . ■- 

Yj — ^)(^i 一 b) i (c k — c)id k — d)/(N — 1), 、 

1 k — I 

■f B a i p *■ ^ mT ' " ~ — r - ■ 

- * - * " *" - - : ，’‘ - ， - : 

where，for convenience, d k — d{k). 、 " 
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11.9 Adaptive Nonparametric Methods 

Frequently，an investigator is tempted to evaluate several test 
statistics associated with a single hypothesis and then use the one 
statistic that best supports his or her position, usually rejection. 
Obviously, this type of procedure changes the actual significance level 
of the test from the nominal a that is used. However, there is a way in 
which the investigator can first look at the data and then select a test 
statistic without changing this significance level For illustration, 
suppose there are three possible test statistics W u W 2y W 3 of the 
hypothesis 丑 0 with respective critical regions C u C 2s C 3 such that 
Pr (e Ci ； ff 0 ) = oc，/ = 1 ， 2, 3. Moreover，suppose that a statistic Q, 
based upon the same data 3 selects one and only one of the statistics W x , 
W 2 , and that W i% then used to test H 0 . For example, we choose 
to use the test statistic W t ifQe D h z’ = 1 ， 2, 3, where the events defined 
by D u D 2 ^ and D 3 are mutually exclusive and exhaustive. Now if Q and 
each W { are independent when H 0 is true, then the probability of 
rejection, using the entire procedure (selecting and testing), is, under 

风， 

Pr(QsD u W { e C ( ) + Pr (2 e i) 2 , W 2 e C 2 ) + Pr (QeD 3 , W,eC 3 ) 

=Pr (Q€D [ )PT(W l EQ)-h^r(QED 2 ) Pr (W 2 eC 2 ) 

+ Pr (QeD 3 ) Pr(W 3 eC 3 ) 

= ocfPr (QeD t ) + Pr (QeD 2 ) +Pr(Qe D 3 )] = a. 

That is, the procedure of selecting fV f using an independent statistic Q 
and then constructing a test of significance level a with the statistic W) 
has overall significance level a. 

Of course, the important element in this procedure is the abili ty to 
be able to find a selector Q that is independent of each test statistic W. 
This can frequently be done by using the fact that the complete 
sufficient statistics for the parameters, given by H 0 , are independent of 
every statistic whose distribution is free of those parameters* For 
illustration, if independent random samples of sizes m and n arise from 
two normal distributions with respective means // f and /n 2 and common 
variance a , then the complete sufficient statistics X y Y, and 
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for fi 2 , and a 2 are independent of every statistic whose distribution 
is free of fx u and a 1 such as 


m __ m 

£ ⑶ -Xf £ j^- - median ( 不 )| 

i j ^ j__ 

i(Y^ Yf' 士 I r f -median ⑽ 


range (X U X 2 ,,.,, X m ) 
ranged A ， … ， X) 


Thus, in general, we would hope to be able to find a selector Q that 
is a function of the complete sufficient statistics for the parameters, 
under so that it is independent of the test statistics. 

It is particularly interesting to note that it is relatively easy to use 
this technique in nonparametric methods by using the independence 
result based upon complete sufficient statistics for parameters. How can 
we use an argument depending on parameters in nonparametric 
methods? Although this does sound strange, it is due to the unfortunate 
choice of a name in describing this broad area of nonparametric 
methods* Most statisticians would prefer to describe the subject as 
being distribution-free^ since the test statistics have distributions that 
do not depend on the underlying distribution of the continuous type, 
described by either the distribution function F or the p.dX / In 
addition ,； the latter name provides the clue for our application here 
because we have many test statistics whose distributions are free of the 
unknown (infinite vector) “parameter” F (or/)* We now must find 
complete sufficient statistics for the distribution function F of the 
continuous type. In many instances, this is easy to do. . 

In Exercise 7,50, Section 7.7, it is shown that the order statistics 
Y\ < y 2 < … < Y n of a random sample of size n from a distribution 
of the continuous type with p.d.f. F(x) = f(x) are sufficient statistics 
for the “parameter” / or F), Moreover, if the family of distributions 
contains all probability density functions of the continuqus type ， 
the family of joint probability density functions of F_ ， K 2 ,,.,, Y„ is 
also complete• We accept this latter fact without proof, as it is beyond 
the level of this text; but doing so, we can now say that the order 
statistics Y u Y 2y ^ , Y n are complete sufficient statistics for the 
parameters / (or F). 

Accordingly，our selector Q will be based upon those complete 
sufficient statistics, the order statistics under H Q . This allows us to 
independently choose a distribution-free test appropriate for this type 
of underlying distribution, and thus increase our power. Although it 
is well known that distribution-free tests hold the significance level a 
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for all underlying distributions of the continuous type, they have often 
been criticized because their powers are sometimes low. The 
independent selection of the distribution-free test to be used can help 
correct this. So selecting ― or adapting the test to the data — provides 
a new dimension to nonparametric tests，which usually improves the 
power of the overall test. 

A statistical test that maintains the significance level dose to a 
desired significance level a for a wide variety of underlying distributions 
with good (not necessarily the best for any one type of distribution) 
power for all these distributions is described as being robust. As an 
illustration, the T (Student's t) used to test the equality of the means 
of two normal distributions is quite robust provided that the underlying 
distributions are rather close to normal ones with common variance. 
However, if the class of distributions includes those that are not too 
dose to normal ones 3 such as the Cauchy distribution, the test based 
upon ris not robust; the significance level is not maintained and the 
power of the T-tcst is low with Cauchy distributions. As a matter of 
fact, the test based on the Mann-Whitney-Wilcox on statistic (Section 
11,6) is a much more robust test than that based upon T if the class 
of distributions is fairly wide (in particular, if long-tailed distributions 
such as the Cauchy are included). 

An illustration of this adaptive distribution-free procedure that is 
robust is provided by considering a test of the equality of two 
distributions of the continuous type. From the discussion in Section 
11.8, we know that we could construct many linear rank statistics by 
changing the scoring function. However, we concentrate on three such 
statistics mentioned explicitly in that section: that based on normal 
scores, say \ that of Mann-Whitney-Wilcoxon, say L 2 ； and that of 
the median test, say Moreover, respective critical regions C {7 C 2 , 
and C 3 are selected so that，under the equality of the two distributions, 
we have 

• ♦ • ■ • * ■- • 讎 ♦ 

* 

- a — Pr (L| g C,) — Pr (L 2 6 C 2 ) = Pr (i 3 e C 3 ) t 

■主 ■ - , Vs 

屬 * n • I 1 

Of course, we would like to use the test given by L { e C* if the tails of 
the distributions are like or shorter than those of the normal 
distributions. With distributions haviing somewhat longer tails, L 2 e C 2 
provides an excellent test. And with distributions having very long tails ， 
the test based on L 3 e C 3 is quite satisfactory. • 

In order to select the appropriate test in an independent manner we 
let F, < < — * < V N , where N = m + n, be the order statistics of 
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the combined sample，which is of size Recall that if the two 
distributions are equal and thus have the same distribution function 
F ，these order statistics are the complete sufficient statistics for the 
parameter F. Hence every statistic based on V,, V 2 , … ， y N is inde- 
pendent ofL ly and L j1f since the latter statistics have distributions 
that do not depend upon F. In particular, the kurtosis (Exercise 1.102, 
Section 1.9) of the combined sample, 





is independent of L lf L 2 , and L 3 . From Exercise 3.64, Section 3.4, we 
Jcnow that the kurtosis of the normal distribution is 3; hence if the two 
distributions were equal and normal we would expect 尺 to be about 
3+ Of course, a longer-tailed distribution has a bigger kurtosis. Thus 
one simple way of defining the independent selection procedure would 
be to let 

Di — {k : k < 3], {k:3 < k <S], D 3 ^ {k:S < k}. 

These choices are not necessarily the best way of selecting the 
appropriate test, but they are reasonable and illustrative of the 
adaptive procedure. From the independence of K and {L u L 2 , L 3 ), we 
know that the overall test has significance level a. Since a more 
appropriate test has been selected, the power will be relatively good 
throughout a wide range of distributions. Accordingly, this 
distribution-free adaptive test is robust, 

_ 4 1 

EXERCISES 

* t » . « ^ 

1 1.48. Let F(x) be a distribution function of a distribution of the continuous 
type which is symmetric about its median ^ We wish to test H 0 : ^ — 0 
against H t : ^ > 0. Use the fact that the 2n values, Xi and —X h 

after ordering，are complete sufficient statistics for i% 
provided that H 0 is true. Then construct an adaptive distribution-free test 
based upon Wilcoxon's statistic and two of its modifications given in 
Exercises 11.23 and 11.24 

11.49, Suppose that the hypothesis H 0 concerns the independence of two 

random variables X and K That is; we wish to test H 0 : F(x, y) = (x)F 2 (y), 

where F, and F 2 are the respective joint and marginal distribution 
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functions of the continuous type, against all alternatives. Let Y } ) 7 
(^ 2 j YiX … ， Y n ) be a random sample from the joint distribution. 
Under the order statistics of X u X 2 , •. . ， and the order statistics of 
Y {y F 2 , " ” are ， respectively, complete sufficient statistics for and F 2 . 
Use Spearman’s statistic (Example 2， Section 11.8) and at least two 
modifications of it to create an adaptive distribution-free test of // 0 * 
Hint: Instead of ranks，use normal and median scores (Section 11.8) to 
obtain two additional correlation coefficients. The one associated with the 
median scores is^ frequently called the quadrant test, 

ADDITIONAL EXERCISES 

IL50. Let Y x < Y 2 < r ' < Y s be the order statistics of a random sample of 
size n = 5 from a distribution of the continuous type with distribution 
function F. Compute Pr + [1 — F(Y 4 )] > 

11.51. Let Y { < Y 2 < ^ < Y s be the order statistics of a random sample of 
size « — 8 from a distribution of the continuous type with median ^ 
Compute Pr (r 2 < g < F 7 ), 

11.52. Let X [y X 2 ,. 5 X g bea random sample of size n — S from a symmetric 

distribution of the continuous type with distributional median equal to 
zero. Modify the regular one-sample Wilcoxon W by replacing the ranks 
I ， 2, 3, 4, 5, 6, 7, 8 by the scores 1, 1 ， 2, 2, 2, 2, 3, 3, to obtain W g . Compute 
the mean and variance of W g . 

11.53. Let Y { < Y 2 < F 3 < Y 4 be the order statistics of a random sample of 
size n = 4 from a contimioiis-type distribution with distribution function 
F(x) and unknown 75th percentile f 0J5 . 

(a) What is Pr(Y 3 <^ JS < Y A )7 

(b) What is the p.d-f‘ of F(Y 4 ) - F(Y^7 

11*54. Let X 2 , X 47 X 5 be a random sample from a continuous-type 

distribution that is symmetric about zero. If we modify the one-sample 
Wilcoxon by replacing the ranks 1 ， 2, 3, 4, 5 by the scores 1,1,1 ， 3, 4, what 
is the m.gX of this new statistic? 

11-55. Let X u X 2 and Y { , y 2 be independent random samples, each of size 
n — 2 7 from distributions with respective probability density functions 
f(x) = 1, 0 < x < 1, zero elsewhere, and g(y) — 3j 2 ， 0 < y < 1, zero 
elsewhere* Compute the probability that the ranks of the /-values in the 
combined sample of size 4 are 2 and 4* 

11,56* Let X u ” ♦ • ， be a random sample of size n — 6 from a 
continuous-type distribution with distribution function F(x), Let 
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if X l is the sample median Y 2 
otherwise. 


Find the distribution of U. 

(b) Argue that U and (Y u F 2 , Y 3 ) are independent, 

(c) Argue that U and X are independent. 

1L60. Let X have a p.dX f(x) of the continuous type that is symmetric about 
zero; that is, f(x) = f(—x) for all real x. Show that the joint m.gX 

E{exp [t t \X\ + t 2 sign (X)]} ^ 2 

and thus \X\ and sign (X) are independent. 

11,61. Let X — Q i 9 - X mean that X — 6 has the same distribution as 
9 — X; thus X has a symmetric distribution about 9. Say that Y and X are 
independent random variables and Y has a distribution which is also 
symmetric about 9. Show that X — Y has a distribution that is symmetric 
about zero. 

Hint: Write X- y = X - 6 - (Y - 6) ^ 9 - X - (9 - Yj^Y-X. 


产 co 


n 


e ttx f(x) dx 



e~ t2 + e t2 
~ 2 


Ri = rank(JQ and consider the scores c(l) — c(2) — I cQ) = 2, c(4) = 3, 
c(5) c(6) = 4. 

(a) What are the mean and the variance ot L — c^) + c(R s ) + c(R 6 )? 

(b) Why are L and the sample range R — max C3Q — min (Xi) independent? 

11.57, Let X 2f X y , X 4 , X s be a random sample from a distribution with 
p*d.f* f(x) = e~\0 < jc < cx), zero elsewhere. Find the probability that both 
X 3 and X 5 are less than JT ，， X l7 and X 4 . Is this answer the same for every 
underlying distribution of the continuous type? 

11.58* Let X 】， Z 2 , …， be a random sample of size n = 5 from a 
distribution of the continuous type. Let — rank (Z,) among 
X u X 2y … ， Z 5 - Find the mean and variance of 

L — Ri 2( 及 2 + 及 3 + ^?4) + 3 及 5. 

11.59 - Let X X j X 2 , X 3 denote a random sample of size n = 3 from a 
continuous-type distribution with distribution function F, It is well known 
that the order statistics Y x < F 2 < are complete sufficient statistics for F. 
(a) Let the statistic U be defined as follows: 
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11.62. Let X u ,X n be a random sample from a distribution that is 

symmetric about 6. Let W and r be two statistics enjoying the following 
properties, respectively: 

- ■ • ， = —， • ， • ， — Xu )， 

W{x x + h ， … ， x„ + h) = W{x u .，”〜)， 
so that W is an even location invariant statistic like 5^ or the range; 

JlfXj ，•， * m f x^) — — 71( —x “ • " ，一 

T^i + A ， * • 琴， A + 々 } = ， _. • ， x n ) + A ， 

so that T is an odd location statistic like X or the median. Show that 

* b * . * a.. 

_ ，毛)一象 •… 


zyr, ， .. • ，总 ) ， w{x u x)]. 

Hint: Write the left-hand member as 


[打义一0，".，毛一❺， w ( x x - o ,..^ x n ~ e )\ . 
using the properties of 7" and W. Then use the fact that substitute 

Xi — 8 = 0 — i — 1， 2 ” • • ，衫 • - 


11,63, The result of Exercise 1L62 implies that r has a conditional 
distribution that is symmetric about 0， given W ^ w. Of course, T has an 
unconditionai symmetric distribution about 0. Moreover, it also implies 
that if appropriate expectations exist ， E(T\ W —w) = ^andcov (T, IV) = 0. 
Suppose that T u T\ ， … 、 T k and W u fV 2t . -,, W k represent k such T and 

W statistics, so that W] + ^ • + W k —\. 

r k l 


(a) Show that E W t Ti — 9, 



(b) Let T x —X and r 2 = m，the sample median. Let HK, ^ ! if JT < 4 and 

zero otherwise, where K is the sample kurtosis, and let W 2 ^\ — W x , 

2 

Consider T = ^ Is its expectation equal to 97 If so, note that T 

, i — I * ^ 

is an adaptive unbiased estimator which equals X for certain values 
and m for others. 
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Tables 


TABLE I 

The Poisson Distribution 
Pr(JT<x)= ^ ^ 


w\ 


X 


F = £(X} 

0.5 1,0 1,5 2.0 3.0 4,0 5.0 6.0 7.0 8.0 9,0 10.0 


789 101112 131415161789 20—22 
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0,000 

0.001 

0.004 

3,84 

5.02 

6.63 

0.020 

0.051 

0.103 

5,99 

7.38 

9.21 

0.115 

0.216 

0.352 

7.81 

9.35 

(L3 

0,297 

0.484 

0JM 

9A9 

NJ 

13.3 

0.554 

0.83! 

U5 

HJ 

12.8 

ISA 

0.872 

1.24 

1-64 

、 12.6 

14.4 

16,8 

124 

L69 

2J7 

I4J 

16.0 

18,5 

1,65 

2J8 

273 

15,5 

17.5 

20 J 

2.09 

270 

3.33 

16.9 

-J9.0 

21,7 

156 

325 

3,94 

18,3 

20.5 

23.2 

3.05 

3,82 

(57 

19.7 

21,9 

24.7 

3.57 

㈣ 

5.23 

21.0 

233 

26:2 

4JI 

5.0( 

5.89 

22.4 

■m _ ' ■ 

247 

27,7 

4.66 

5,63 

6.57 

237 

26 J 

29.1 

5.23 - 

6,26 

726 

: 25.0 

2Z5 

30.6 

5.8 f 

6.91 

7M 

263 

28,8 

32*0 

Ml . 

7.56 

8-67 

27.6 

30,2 

33.4 

7,01 

8.23 

939 

28.9 

31.5 

k _ 

34,8 

7.63 

8*91 

m l 

30J 

32.9 , 

36.2 

8.26 

9,59 

10.9 

31.4- 

34.2 

37.6 

8.90 

10.3 

11.6 

3Z7 

35.5 

38.9 

9.54 

fLO 

12,3 

3X9 

36_8 

403 

10,2 

IIJ 

t3J 

35.2 

38 J 

AU6 

10.9 

12,4 

118 

36,4 

39*4 

43,0 

H.5 

13.1 

H,6 

37.7 

机 6 

443 

12.2 

13,8 

15.4 

38.9 

41,9 

45.6 

119 

14.6 

16.2 

40 J 

43,2 

47.0 

13.6 

153 

16,9 

礼 3 

44,5 

48.3 

14.3 

16-0 

177 

42.6 

45.7 

49,6 

15,0 

16.8 

18.5 

43.8 

47,0 

50.9 


TABLE U 


The Chi-Square Distribution^ 


Pt(X<x) 


r(r/2)2" 2 


以 /2 -i e -, v/2 dw 


r 


0.01 


0,025 


Pr (X<x) 
0.050 


0-95 


0.975 


0,99 


•This table is abridged and adapted from “Tables of Percentage Points of the Incomplete Beta 
Function and of the Chi-Square Distribution/ 4 Biometrika^ %2 (1941), It is published here with 
the kind permission of Professor E, S, Pearson on behaJf of the author, Catherine M. Thompson, 
and of the Biometnka Trustees. * 
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TABLE III 

The Normal Distribution 


Pr(JT<x)-0(jc) 


—j= 11,2/2 dw 

y/2n 


[0( _ x) = 1 — ®(x)] 


X 

O(x) 

1， 

<Kx) 

X 

<I>(X) 

0.00 

0.500 

IJO 

0.864 

105 

0,980 

0,05 

0.520 

U5 

0.875 

2J0 

0*982 

0.10 

0,540 

1,20 

0.885 

2.15 

0,984 

0J5 

0,560 

U5 

0-894 

2.20 

0586 

0,20 

0,579 

(.282 

0.900 

2-25 

0,986 

0.25 

0,599 

1.30 

0.903 

£2.30 

0.989 

0.30 

0.618 

135 

0.9H 

2326 

0,990 

0.35 

0.637 

! LAO 

0,919 

2,35 

0,991 

0*40 

0,655 

L45 

0.926 

2,40 

0.992 

0.45 

0.674 

1 1,50 

0.933 

2.45 

0-993 

0.50 

0.69( 

L55 

0.939 、 

2.50 

0 州 

0,55 

0J09 

1.60 

0.945 | 

2_55 

0.995 

0.60 

0.726 

1,645 

0.950 

2.576 

0.995 

0.65 

0,742 

L65 

0.951 

2.60 

0.995 

070 

0.758 

170 

0.955 

2.65 

0*996 

0J5 

0.773 

1.75 

0.960 

2.70 

0.997 

0.80 

0788 

1,80 

0,964 

175 

0.997 

0.85 

0.802 

1,85 

0-968 

180 

0.997 

0,90 

0.816 

1.90 

0.971 

2.85 

0.998 

0.95 

0.829 

1.95 

0.974 

2,90 

0.998 

LOO 

0.841 

1.960 

0.975 

2,95 

0,998 

LOS 

0.853 

2.00 

0.977 

3.00 

0.999 
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6.314 

12.706 

31,821 

63.657 

2,920 

4,303 

6.965 

9.925 

2,353 

3.182 

4.541 

5.841 

2,132 

2.776 

3747 

4.604 

2.015 

2.57 [ 

3.365 

<032 

1.943 

2.447 

3*143 

3,707 

1.895 

2-365 

2.998 

3.499 

1.860 

2.306 

2,8% 

3J55 

1,833 

2262 

2.821 

3.250 

1.812 

1228 

2764 

3J69 

1796 

Z20J 

2JI8 

3J06 

1782 

2.179 

2,681 

3.055 

IJ7\ 

2J60 

2.650 

3.012 

1761 

2J45 

2.624 

2.977 

IJ53 

2J3I 

2.602 

1947 

1746 

2,120 

2,583 

2,921 

1740 

2,110 

1567 

2.898 

1734 

2J0I 

2352 

2.878 

1.729 

2.093 

1539 

2.86! 

1,725 

2,086 

2-528 

2.845 

1.721 

2.080 

2.518 

1831 

)717 

2.074 

2.508 

Z8I9 

1714 

2.069 

2*500 

2.807 

IJII 

2.064 

2.492 

2J97 

1,708 

2,060 

2.485 

2.787 

1.706 

2.056 

1479 

2,779 

1.703 

2.052 

2.473 

2.771 

IJOI 

2.048 

2.467 

2763 

L699 

2.045 

2M2 

2J56 

i mi 

1042 

2.457 

2750 


•This table is abridged from Table III of Fisher and Yates; Statistical Tables for Biological^ 
Agricultural^ and Medical Research^ published by Oliver and Boyd, Ltd, ， Edinburgh, by 
permission of the authors and publishers. 


TABLE IV 


Pr(r<0 


The. t-Distribution* 

n(r + 1)/2] 




y/nrT{r 12)(1 + H^/r) (r 


dw 


m 


[Pr(nO 




Pr(T<t)} 


r 


0,90 


Pr (T ^ t) 
0,95 0.975 


0,99 


0,995 


07 6 638||§9772恶^5溫漂 252 !dK 巧加 


* 9 
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APPENDIX 


Answers to Selected 
Exercises 


CHAPTER 1 

1*1 (a) {at : x = 0, 1, 2, 3, 4}; 

{x: x — 2}, 



(b) {x : 0 < x < 3}; 


{x: l <： x < 2}. 

1-2 

(a) {x : 0 < x < |}* 

L7 

(a) {x:0 < x <3}. 


(b) {(x y 少 )： 0 < x 2 


+ y 2 < 4}, 

1.8 

(a) {x : x = 2}, 


(b) NulJ set. 


⑻ {(x t y):x z + y 2 -^ 

1-9 

80 , 1 
si ^ 

UO 

M ; 0; L 

1.11 

|; 0; tt/2. 

1,12 

5 ； 0; | . 

1.13 

i;o. 

U5 

10. 

U8 

1 . f , I - 4 

4 ， i3 ，， n - 

L19 

11 * 立 ■丄 * ^ 

32 ^ 64 » 32 * 64 - 

1.20 

03. 

1.21 

e^\ 1 — d L 


1.22 

1.26 


1.27 

h29 



134 ( a )j_(b)|. 


(c) 


：0/0： 


{5/(8 - x)l 


137 I. 

138 |； f . 

1.39 ^ 

1*40 f ，令. 

1,42 (a) 0J8. (b) 0.72. 
(c) 0.88. 
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U-2x 

36 


l/2p; f ; f ; 5; 50. 

3£ Wl 

n ，兩 ， 

5 Jl 
f ? 192 - 

0-84, 


2 , 


9 


， 2, 3, 4, 5. 


5,7*3 

f 5 8 » 8 " 

e^ 2 -e~\ 

士， 1; i ， ff ， 

⑻ L (b) f, ⑻ 2. 

⑻ 0, x<0; 1—(l-x ) 3 ， 
0^x< I; 1, 1 <x; 

1 -來 、 1 -於 

(a) Mb) Me) \-(d) 0. 
0, 少 <0;〆, 0 <j^<1; I, 

i <y, 2y, 0<y<l; 

0 elsewhere, 

1, i 
2^4* 

0, x<0; 1 -e- x !l0<x. 
je~ JC ( 2 0<x; 0 elsewhere. 

l/3v^ ， 0<y<l; 1/6^/y , - 
1 <j^<4; 0 elsewhere, 

2; 86*4; _160.8, 

3; II; 27, 
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1,83 






85 

88 

89 

90 
99 

101 


⑻ [ 

$7.80. 

(a) 1.5, 0.75, (b) 0,5, 0,05 
(c) 2; does not exist ‘ 
e r /(2_e r ), t<\n 2; 2; 2. 

10; 0; 2; —30, 

2^/2 ijl 


1.54 


L56 


1*59 

1,61 

1.63 

1.64 
1.66 
1,69 


1.71 

L72 


1,74 

1.76 

1.79 

1.80 
1,81 


CHAPTER 2 


2*1 萏；0;|;!* 

2.2 I, 

2.6 ze~\ 0<z< oo; 

0 elsewhere* 

2.7 —In z, 0<z<l; 

0 elsewhere. 

2*10 5x^, 0<x 2 < I; 

0 elsewhere. 

2.II (3^+2)/(6^,+3); 

3x 2 /4; 3 冷 80. 

(b) 1/e. 

(a) Lfb) 0 ， 

(a) 


si 


2,13 
2.18 
2.20 
2*21 

2.31 

2.32 l ， 

2.36 I ， 

2.38 ⑻！ ，()• 

Z39 0^y<U 

I2(l^) n , 0<y<K 
2.40 g(y) = {y 3 -iy-lf]/6\ 
y= 1,2, 3,4, 5, 6. 

2.42 i>2 = (Ti(pi2^ p13^23) / 

b^a^{p n -p n p tl )j 


Answers 


1.45 0J029 for (a), (b) ，（ c) ，（ d). 



⑻ 

0.4116, 

1.46 

i : 
4 1 : 

i 

i * 

1.47 

9 

TI ， 

i \ r 
13 * IT » B 

IAS 

⑻ 

! 肩 2 」 

1.49 

1 l 

i 


13V 39 

x A 5—A ： 


1.51 


(?) 


x = 0, I, 




If 

Jc o:， 
, • II 

1T04TOJC 
⑻ (bJ&36» 


0305 1014 
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Answers 


CHAPTER 3 


3.1 

3.4 

3*6 

3.8 

3.10 

3.13 

3.14 
3.17 
3J8 

3.20 

3.21 
3*22 


40 
8l ■ 

147 

512 


16 

65 

IT 


(j)(lr" 3 s x=3, 4, 5，.“ 


72 


625 ' 

呈； W 2; 

25 


0,09. 


3.25 4 x e^ 4 lxlx^0 7 1,2, 

3.26 0.84. 

3.31 2, 


3.33 (a) exp 【一 2+#0+〆》)】, 

(b) a = I s /i 2 = 2, 

' a, = al — 2 r 

(c) y/2. 

3.34 0.05. : 

3.35 0.831, ]2.8. 

336 0.90, 

3.37 f(4). 

3.39 3^~ 3> , 0<^<oo, 

3.40 2, 0.95. 

3*45 {I , 

3.46 X \2l 

3-49 0.067; 0.685. 

3.51 71.3, 189.7. 

3.52 ^/hil/n. 

3.57 0.774. 

3*58 y/ljn; {k — 2)jn, 

3.59 0*90. 

3.60 0.477, 

3.61 0,46 K 
3.62 釋， 1), 

3.63 0.433 ， 

3.64 0; 3. 

3*69 MO, 2). 

3.70 (a) 0,574. 

(b) 0.735, 


3.71 

(a) 0.264. (b) 0.440. 


(c) 0,433. (d) 0.642. 

3.73 

P=! * 

r 

3,74 

(38.2,43.4). 

CHAPTER 4 

- 

4,2 

405 

P024 ^ 


4.3 

0.405. 


4.6 

t6 

Is * 


4.7 

i 

R * 


49 

(«+I)/2; (« 2 ~ 1)/12, 

410 

a + hx\ b 2 夂. 

4.11 

Z 2 ( 2 ), 

4 

414 

5, 0<y<\ 

l ； 


l/2y\ I < 少 <00. 

415 

y 5 , 0<y< I; I5y l4 f 


0<y< 1. 


4J6 

4 

1 • 


4.17 

1^-3, 5, 7. 

4.19 

( 的产 1,8, 21，… 

4.20 

y\ 

grOi) 


p 

i 

r 

36 


2 

4 

36 


3 

6 

36 


4 

4 

36 


6 

12 

36 


9 

9 

36 

4-25 

■fi i 0 <y <27. 

432 

y\ e^ y \ 0<^]<oo. 

4.34 

(2 yi )(4A)，0<y } <U 


0<y 2 < K 


435 

a/(« + /?); 



«M(a + y?+0(a+7?) 2 ]^ 

4.36 (a) 20. (b) 1260. (c) 495. 

4_37 晶 . 

4.40 0,05, 

4.43 1/4.74, 3*33. 2 

4.48 (l/^tyyie-^ 2 sin^, 

0 <y t < oo f 0<^ 2 <2h, 

0<yj<n. 
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4.1 ❶ 3 0.78. 

4.104 l；l 

4.105 7. 

4.107 2*5; 0,25, 

4109 —5; 60-12^. 

4.11 ❶ + 

4,113 0.265. 

4.115 22.5, 誓， 

4J16 r 2 >4. _ 

4*118 + + , 

4.121 5/^* 

4125 #+ 办；户抑 2 (〆 一 i). 


4.49 y 2 yie^\ 0<^<l s 
0<y 2 <U 0<^ 3 <oo. 
4.53 11(2^1 0<y<L 

454 e-^l(2n^/^ 2 l 

-\fyi<yi<s/yu 
0<jj <oo* 

4,56 1-(1-^ 3 ) 4 , 

457 I 
4.62 咅 ■ 

4*63 48z|Z^, 0<Zj < I, 
0<2 2 <I, 0<z 3 <K 

4,64 長 . 

4.69 f 

4.70 6uv(u + v), 
0<u<v<l. 


475 y 

g(y) 

2 

! 

36 

3 

36 

4 

3 

36 

5 

■T 

36 

j6 

5 

36 

A 

7 

36 

• 

5 

36 

9 

4 

36 

10 

3 

36 

n 

2 

36 

12 

1 

36 

4.76 CU4. 


4.79 0.159. 

4.82 0,159. 

4.88 0.818, 

4.91 (b) ^ 

—1 or 1 


(c) Z i —a i Yi+fij. 
4.92 |>A=0- 

i 

4.94 6.4L 

4.95 rt = 16, 

4*97 (n-iy/n; 

2(n-l)<r 4 /n\ 

4M 0.90. 

4.100 0.945. 

4.102 0.61 & 


CHAPTBR 5 

5.1 Degenerate at fx. 

5.2 Gamma (a — 1, 1), 

5.3 Gamma (a= I, ^ = 1)* 

5.4 Gamma (a —2; ^ = 1)- 

5.13 0,682. 

5.14 (b) 0.815- 

5.17 Degenerate at n 2 

5.18 (b) N((K I )， 

SA9 (b) A^(0,1). 

5.21 0.954. 

5.23 0.840, 

5.26 0.08. 

5.28 0.267. 

5.29 0.682. 

5.35 N(0, 1). 


CHAPTER 6 

6*1 (a) £ 

(b) z nlln(X l X 2 - -X n y 

(c) X. (d) The median, 

(e) The first order statistic. 
6,2 The first order statistic Y u 

i 
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Answers 


£L a ± 11 ± 

0 _4 25 ' 25 ' 25 * 

6,5 Fj^min (JT,); 

nl\n[{X x X 2 yX n )IY\l 
6-7 (b) Xi(\-X). (d) r 

(e) 无 —_ 1 , 

6.9 1 —e -1 K 

6 JO Multiply by nj{n— 1), 
6 J2 (^+^)/ 2 ,(/^ ^)/ 2 ; 

£[(} ； -^)/ 2 ] = 〆《-!)/ 
(«+!)- 

6A4 (77,28, 85J 2), 

6.15 24 or 25, 

6.16 (3/7, 5_7), 

6.17 160. 

6,23 (55/6, 55/4), 

6.25 1692. 

6.26 3.19 to 3.61 ■ 

6.28 3.92 to 31.50, 

6.30 (-3.6,10). 

05 135 or 136. 

6_38 5 +1 In I; j^+l In | - 
6.39 (31)3 8 /4 9 . 

6.42 n = 19 or 20 , 

6.43 K({)^0M2; 

. 艰 ) = 0.92(K 

6.44 n^73, cs42. 

6.46 (a) Reject. 

(b) /?-value^ 0,005, 

6.49 (c) Rvalue % 0,005. 

6.51 23.3. 

6.52 2,9L 

6*53 % = @>7.81, 
reject // 0 , 

6.55 6<8 or 32< b. 

6.56 ^ 3 =y< 11.3, 
accept Hq. 

6*57 6*4 <9,49, accept H Q . 
6,59 p^(X [ + X 1 /2) / 

(X { +X 2 + X^ 

CHAPTER 7 

7.4 If. 

7.5 d { (y). 


7.6 6 = 0 ; does not exist, 

7.7 Does not exist. 

7.17 n [Xd -m 

i — i 


7.19 60/ 3 (y 5 -y,)/e s ;6y 5 /5 ： 

e 2 p ； e 2 /35. 

7.20 (l/e 2 )e^\ 

0< woo; 

yj2 ； e 2 /2. 

7.22 I Xj/n; S Xjn; (n+ IJY^n 

7.24 X;X. 

7.25 YJn. 

7.27 Yy-ljn. 

7.29 Y t = t^ YJ4n;yes. 

737 x, 

7.40 X 2 - 1 jn. 



7.51 


7.55 


Y\ + Y n U+l)(m) 
~2 ~ 1 ^2(n-l) 


CHAPTER 8 

8.2 lyr 1 + 〆/«]/( t 2 + a 2 jn). 
8.3 沉 y + a 獅 +1). 

8*13 $ 2 /n; $ 2 /n(n + 2). 

8.15 (a) 4/0\ 

8.17 (d) var(^)=^=^. 

8.22 2,17; 2.44. 

8.25 2*20. 


CHAPTER 9 
to 

9.4 > I 8 J; yes; yes* 









Answers 


10 to 

9.6 3 + 

i i 

9*7 95 or 96; 76.7. 

93 38 or 39; 15. 

9.10 0.08; 0.875, 

9 ji (i-ef(i+9e). 

9.12 h0<$<^; I/(16^) ? 
|<0<1; 1-15/(160 )， 

1^9. 

9.14 53 oi* 54, 5A 

9.17 Reject// 0 if x> 77.564. 

9.18 26 or 27; 

reject H 0 if x< 24. 

9.19 220 or 221; 

reject if 少 > 17* 

9.23 t — 3> 2.262, reject H 0 . 

9.24 |/|-Z27>2J45, 
reject H 0 . 

937 c 0 (w) —(14>4) 

x(nln 1.5 _ln 9,5); 
C|(w) — (14,4) 

x (n In 1.5 +In 18). 
9*38 c 0 (n) = (0.05fi_ in 8)/ln 3.5; 

C| (n) = (0.05n—In 4,5)/ln 3.5. 
9.41 (b) c=0.18; 0.H 

(c) c—0,5; 0J6; 0,84, 

(d) e=0.23; 0.06; 0.68. 

9.44 (9^^20jc)/30<^ 
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CHAPTER 10 

10.9 639. 

10.12 r+0 ， 2r+46, 

10.13 r 2 (0+r 1 )/[r 1 (r 2 -2)]. 

10,23 7,00, 9.98, 

10.25 4.79, 22.82, 30.73. 

10.26 (a) 4.483^+6,483. 

10.28 

10,32 Reject ff 0 . 

10*44 a f -0 T i=U 2 7 3, 4. 

,10-45 Y, %=0, h I, 2, ■ • ■ ， w, 

/-I 

CHAPTER 11 

11.2 (a) If . (b) 675/1024; 

(C) (0_8) 4 , 

11.4 8. V 

lh6 0+954; 0.92; 0,788. 

11.9 8. 

11.12 ⑻ Beta (n~j+l 3 J). 

(b) Bets {ft — j +1 _ 1 ， 

j—i+2), 

11.15 0.067. 

11.18 Reject H 0 . 

11,25 0; 4{4 n ^ 1)/3; no, 

1L37 矣 . 

^44 98; f. 
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Index 


Absolute-error loss function, 31 i, 367 
Adaptive methods, 536, 542 
Algebra of sets, 4 
Analysis of variance, 466 
Ancillary statistic, 347, 353 
Andrews, D. F., 393 

Approximate distribution, 248, 251, 381, 392. 
525 

chi-square, 295, 422 
normal for binomial, 249, 499 
normal for chi-square, 244 
normal for Poisson, 246 
Poisson for binomial, 244 
Arc sine transformation, 252, 273 
Asymptotically efficient, 379 

Basil, Dm 3S4 

Bayes' formula, 23, 364 

Bayesian methods, 363 t 437 

Bernoulli trials, 116 

Bernstein, S., 112 

Best critical region, 396 t 399, 402 

Beta distribution, 180, 504 

Biased estimator, 263 

Binary statistic, 514 

Binomial distribution, 11$. 244,249,254,498, 
506 

Bivariate normal distribution, 146, 212, 226, 
346, 385, 439, 478 
Boole’s inequality, 465 
Bore! measurable function, 29, i 56 
Box-Muller iransformation, 177 
Burr distribution, 372 


Cauchy distribution, 175, 257* 387 
Censoring, 49 

Central limit theorem, 246, 511 
Change of variable, 163, 168, 186 
Characteristic function, 64 
Characterization, 202, 214 
Chebyshev's inequality, 68, 120, 222, 240 
Chi-square distribution, 134, 144, 210, 294, 
447, 482, 489, 491 
Chi-square test, 293, 424 
ClassificatioTi, 439, 496 
Cochran's theorem, 490 , t 

Column effect, 467* 470 
Complement of a set, 7 
Complete sufficient statistics, 332, 335, 343, 
353, 537 

Completeness, 329, 343 
Composite hypothesis, 284, 288, 406, 4 1 3 
Compounding, 372 
Conditional distribution, 82, 148 
Conditional expectation, 84, 110 
Conditional mean, 85, 93, 123, 148, 357. 367 
Conditional probability, 83 
Conditional p,d.f„ 83, i09, 148, 364 
Conditional variance, 85, 95, 148, 357 
Confidence coefficient, 270 
Confidence interval, 268 ¥ 289, 462 
for difference of means, 276 
for means, 268, 462 
for p, 272 
for quantiles, 497 
for ratio of variances, 280 
for regression parameters, 473 
for variances, 276 
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Index 


Contingency tables, 299 
Continuous-type random variables, 37, 108 
168, 193 
Contrasts, 464 
Control limits, 431 
Convergence, 233, 239 t 240 
in distribution, 233 
in probability, 239 
with probability one, 240 
Convolution formula, 178 
Correlation coefficient, 9i s !D0 T 105, 123, 150, 
377, 478, 534 
Covariance, 92, 386, 535 
Covariance matrix ， 227, 385 
Coverage, 503 
Cramer, 243 
Critical region, 282, 284 
best, 396, 399, 402 
size, 285 

uniformly most powerful, 405 
Curtiss, L R, 243 
CUSUMS，432 

+ 

Decision function, 308, 433 
Degenerate distribution, 65, 235 
Degrees of freedom, 134, 298, 422 
Delta method, 251 
Dependence, 101 
Design matrix, 493 
Discrete-type random variable, 28 
Distribution 
Bernoulli, 116 
beta, J80, 504 

binomial, 1 18, 244, 249, 254, 498, 506 
bivariate normal, 146, 212, 226, 346, 385, 
439, 478 
Burr, 372 

Cauchy, 175, 257, 387 

chi-square, 134,144,210,294,447,482,489, 
491 

conditional, 82, 109, 148, 364 
continuous-type, 37, IQS 
of coverages, 503 
degenerate, 65, 235 
Dirichlet, 188, 371, 504 
discrete-type, 28 
double exponential^ 376 
empirical, 158 
exponential* 133, 203 
exponential class, 333, 343 
of/% 182, 221, 421, 451,463 
function, 34, 37, 44, 78, 108, 501 
of functions of random variables, 155 
ofF(X) r 161 


gamma, 131, 202 
geometric, 121 
hypergeometric T 34, 56, 517 
limiting, 233, 237, 243, 253, 294, 380 
of linear functions, 20S 
logistic, 178 
lognormal, J54, 222 
marginal, 80, 93, 101， 109 
multinomial, 121 ， 199, 295, 515, 527 
multivariate normal, 223, 294, 482 
negative binomial, 121 
of noncentral chi-square, 301, 458 
of noncentral F, 458, 468 
of noncentral 7 ； 420, 460 
normal, 138, 143, 147, 208, 214, 247, 381, 
446 

of214 

of order statistics, 193, 258 

Pareto, 267 

Poisson, 126, 166, 244 

posterior, 367, 493 

prior, 367, 493 

of quadratic forms, 447, 481 1 
ofR. 480 

of runs, 518 i 

of sample, 158 

of sample mean, 2J4, 220, 249 
of sample variance, 214 
of Z 181,217, 218,238, 277, 356, 4J5 S 419, 
476, 480 

trinomial, 122, 371 
truncated, 146 
uniform, 48 t 160 
Weibull, 137, 201, 372 
Distribution-free methods, 498, 506, 537 
Distribution function, 34, 37, 44, 78, 108, 501 
Distribution-function technique, 50, 159 
Double exponential distribution, 176 


Efficiency, 377 
Element, 3 

Empirical distribution, 158 
EqualJy likely events, !5 
Estimation, 259, 307, 363 
Bayesian, 363 
interval, 268, 370 

maximum likelihood, 26! ， 324, 380,385,389 

method of moments, 266 

minimax, 309 

minimum chi-square, 298 

point. 259, 307 

robust, 387 

unbiased, 263, 307, 340, 381, 542 


Index 
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unbiased minimum variance, 307, 326, 332, 
338 

Estimator, 259 t 307, 363 
consistent, 264, 384 
efficient, 377 

maximum likdihood, 262, 324,380,385,389 

■ 

minimum chi-square, 298 
minimum mean-squarc-error t 310 
unbiased, 263, 30*7, 340, 381, 542 
unbiased minimum variance, 307, 326, 332, 
338 
Events, 2 
equally likely, 15 
exhaustive, 15, 22 t 297 
independent, 24, 27, 103 
mutually exclusive, 15, 22, 297 
Expectation (expected value), 52, 87, 109,205 
218 

conditional^ 84 
of a product, 104 
Exponential class, 333, 343 
Exponential distribution, 133, 203 

F-distribution, 182, 221,421, 451, 463 
Factorial moment, 65 
Factorization theorem, 318, 334, 341 
Family of distributions, 260, 330 
Fisher, R. A., 372, 441 
Fisher’s information, 372, 385 
Fisher's linear discriminate function, 441 
Frequency, 2 
relative, 2, 12, 17 
Function, characteristic, 64 
d©:ision, 308, 433 

distribution ， 34, 37, 44 t 78, 108 t SOI 
gamma, 131 
likelihood, 261， 416 
loss, 308, 367, 434 

moment-generating, 59, 97, 105, U K 203, 
209, 243, 486, 510 
of parameter, 33S 
point, 7 

power, 282, 285, 443 

probability density, 33, 39, 45, 50, 76, 108, 
397 

probability distribution, 34, 37* 44 
probability set, 12, 29, 47, 75, 108 
of random variables, 155 
risk, 308, 368, 434 
set，7 

Gamma distribution, 131, 202 
Gamma function, 131 
Geometric distribution, 121 


Geometric mean t 336 
Gini's mean difference, 203 
Goal post loss function, 311 
Gosset, W. 182 

Huber, P,, 390 

Hypergeometric distribution, 34, 56, 517 
Hypothesis, 280, 284, 395 

Independence, 19, 24, 1 】 I ， 150, 157, 353, 480, 
537 

of linear fonns, 214, 228 
mutual, 25, 111 
pairwise, 25, 112 

of quadratic forms* 447, 481 f 486 
test_of, 478 

of JPand5 2 T 2I7 t 231, 354 
Independent events, 24* 27 ^ 

Independent random variables, 100, 157, 167, 
176 

Independent trials, 26 
Inequality, 

Boole, 465 

Chebyshev, 68, 120, 222, 240 
Rao-Blackwell, 90, 326 
Rao-^Cramer, 372 
Information, Fisher's, 372, 385 
Interaction* 469 
Intersection of sets, 5 
Interval 

con^dence« 270, 289, 462 
prediction, 149, 275 
random, 269 
tolerance, 500, 503 

Invariance property of maximum likelihood 
estimation, 265, 474 

Jacobian, 179/186, 224 

Joint conditional distribution, 1!0 

Joint distribution function, 79 

Joint probability density function, 79, 111, 397 

Kurtosis, 66, 539 

Law of large numbers, 120, 222, 240 

Law of total probability, 23 

Least squares, 473 

Lehmann alternative, 529 

Lehmann-Schcffe, 332 

Levy, P., 243 

Liapounov^ SU 

Likelihood function, 261, 416 

Likelihood principle, 312, 324 
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Index 


Likelihood ratio tests, 409, 413, 422, 452, 467 
Limiting distribution, 233, 237, 243, 253, 294, 
380 . 

Limiting moment-generating function, 243 
Linear discriminant function, 441 
Linear functions’ 208 
covariance, 223 
mean, 219 

moment-generating function, 209, 228 
variance, 219 
Linear model, 472 
Linear rank statistic, 529 
Location-invariant statistic, 351,355,443, 542 
Logistic distribution, 178 
Lognormal distribution, 154, 222 
Loss function, 308, 367, 434 

Maim-Whitney-Wilcoxon, 521 
Marginal probability density function, 80, 93, 
101, 109 

Mathematical expectation, 52 f 88, 109, 205, 
218 

Maximum likelihood, 261 
estimator, 261, 324, 380, 385, 389 
method of, 261 
Mean ， 5M 58 

conditional ， 85, 93, 123, 148, 357, 367 
of distribution, 58 
of linear functipn, 219, 530 
of a sample, 158 ， 

of X, 58 
of I， 220 
Median, 44, 57 
Median test, 516 
Af-e^timators, 387 
Method 

of least squares, 473 
of maximum likelihood, 261 
of momeitts, 266 
Midrange, 200 ' 

Minimal sufficient statistics, 347 
, criterion, 309, 433 
cfepision function, 310, 433 
Minimum chi-square estimates, 298 
Minimum mean-square-error estiriiates, 310 
rtlode, 43 

15, 40, 78, 325, 472 

Moment-generating function, 59,97,105, 111, 
^ 209, 243, 486, 510 
of binomiai distribution, 118 
of bivariate normal distribution, 149 
of chi-square distribution, 134 
of gamma distribution^ 133 
of multinomial distribution, 123 


of multivariate normal dtstribution; -226 
of noncentral chi-square distribution, 448 
of normal distributioh, 139 、 

of Poisson distribution, 128 >r 
of trinomial distribution, 123 
ofX, 209 ■ 

Moments, 62, 65 

factorial^ 65 ■. ’ 

method of, 266 

Monotone likelihood ratio; 409 - 
Most powerful test, 397, 405, 料 3 【 
Multinomial distribution, 121, 199, 295, 515, 

527 V , 

( ■ 

Multiple comparisons, 461 
Multiplication rule, 21 

Multivariate normal distribution, 223, 294, 
482 、 

M utually exclusive events, 15, 30, 297 
Mutually independent events, 25, 111 
Mutually independent random variables, 111 

■i ^ ^ 

■ii 

Negative binomial distribution, 122 
Neyman factorization theorem, 318, 334, 341 
Neyman-Pearson theorem, 397, 43S 
Noncentral chi-square, 301, 458 
Noncentral F t 458, 468 
Noncentral parameter, 420, 459, 468, 485 
Noncentral T, 420, 460 
Nonparainetric methods, 497, 506, 536 ^ 
Normal distribution, 138, 143, 147, 208, 214, 
247, 381,446 

Normal equations, 493 . 

Norma! scores, 514, 533 ' 

Null hypothesis, 288 t 4J3 . 

Null set, 5, 13 、 

* 

Observations, 156 
One-sided test, 288 

Order statistics, 193, 258, 347, 498, 501, 527 
distribution, 193, 258 
functions of, 200, 503 
Outcome, l t [2 w 

i 

Paired r-test, 306 
Pairwise independence, 25, 112 
Parameter, 118, 129, 134, 143, 420 
function of, 338 
location, 143, 350, 388 
scale, 144, 351 ■ 

shape, J44 

Parameter space, 260, 415, 434 

Pareto distributioh, 267 ^ , : 

Partition, 15, 22, 315 

Pearson, Karl, 293 v [ , ^ 
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PercentiJe, 44, 505 
Personal probability, 3, 367 
PERT, 203 


Point estimate, 25% 307 
Poisson distribution, 126, 166. 244 
Poisson process, 127, 132, 137 
Posterior distribution, 23, 367. 493 
Power, 282, 397, 405 
function, 282, 285 


Power of a test/285. 397, 405, 443 
Prediction interval, J49, 275, ... 

Prior distribution, 23. 367、 493 
Probability, 2, 12, 17,29 
conditional, 19, 23, 83 
induced, 29, 37 
measure ， 2,12, 29 , 、： 
modds, 15, J7, 47, 78 
posterior, 23 
prior, 23 . 

subjective, 3, 367 , 4 . 

Probability density function, 33 T 39,45, 50,*76 
108, 397 


* - ^ 1 
«■ n - ■ «_ Y? -- •r - 

■ -A '• V fJ 

p" 

% 

r 


conditiona!, 83, 109 
exponential class, 333,343 
joint, 79, 111,397 
marginal, 80, 93, J01 ， J09 
posterior, 36 入 493 
prior, 367, 493 

Probability set ftinction，R 29, 47* 75, J08 
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Statistical Inference 


Confidence Intervals for Means: Normal Assumptions 

x ± aef^/n, where <S>(a) — 1 一 a/2, for ^ with a 2 known 

又土 bs/y/n - I, where Pr (T <6)-1 _ a/2, for fi with o 2 
unknown 

x { ~x 2 ± 


丄 + 丄)—-此 

y n t + n 2 — 2 \n l n 2 J 


Approximate Confidence Intervals for Binomial Parameters 
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土 a 


l(y/n)(l - yjn) 


where <D(a) = 1 — a/2, for 
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One-Sided Tests of Hypotheses: Normal Assumptions 

a- 

H 0 : ft — fi Q against H { : pi> // 0 - Reject H 0 if 

■BW 

^ ~ Uo 


,B li 


f _> c, where Pr (T > c) 

S!s/n^\ 

H 0 : /^! = fi 2 against Hi ： > fi 2 ^ Reject if 

Xy - X 2 


a 


+ 汽 2 对 ^ 

+ n 2 -2 n 2j 


> d, where Pr(T>d) = a 


One-Sided Test of Hypothesis About p 

H 0 : p ~ p 0 against H { : p > p Q . Reject if 


Yjn - p 0 
JpoO - Po)/^ 


> K 

■f 


where <J>(k) = 1 — a 


Chi-Square Test 

4 

Reject null hypothesis concerning probabilities if 

Y (O 私 ~~ Expj}— > fj where h is the 100( 1 — of) 
m Ex Pi - 


percentile of ^ 2 (r), where r is the difference of the 
dimension of the total parameter space and that of 
the parameter space under the null hypothesis 


















Other Important (Concepts 


Sufficient Statistics 


The statistic u(X u X 2i •.., X„) is sufficient for 0 if 
and only if the joint p.dX of X u X 2 , …， equals 

,x n ); 6]k 2 (x u 

叉 2 ， … ， ， Ifj) 

where the function k 2 does not depend upon 0, 


Let Fi and Y 2 be two statistics such that Y { is sufficient for 
9 and Y 2 is an unbiased estimator of 0. Then E(Y 2 \Y { ) — (p(Yi) 
is unbiased and var [(p(Yi)] < var (Y 2 )* 

If the random sample arises from a distribution with p.d.f. 


f(x; 6) = exp [p(0)K(x) + S{x) + q(0)l a<x< b y 

n 

then [ K(Xi) is a complete sufficient statistic for 6. 

I 

If Y x and Y 2 are statistics such that Y x is complete and sufficient 
for 6 and Y 2 has a distribution that does not depend upon 
6, then Y t and Y 2 are independent. 


Maximum Likelihood Estimators and Related Tests 


The maximum likelihood estimator 9, which maximizes L(9\ 


the joint p,dX of the random variables, is a function of the 
sufficient statistic if it exists. Under certain conditions, 0 has 


an approximate normal distribution with mean G and variance 


1 / [/i/(0)] ? where the Fisher information 1(6) equals 
d lnf(X; 9) 


E 


dd 


and thus is asymptotically efficient. 


The region defined by < k provides a best critical 

region for testing H 0 : B — 6 f against H l : 6 — 0", 

A likelihood ratio test is defined by i = L(w)/L(Q) < 

where L(cS) and L(Q) represent, respectively, the maxima of the 

* 

likelihood function in the parameter space co and O, c ： Q. 
Let 0, 0i ， … ，仏 - 1 ，仏 be quadratic forms in normally 


distributed random variables such that Q is Jf 2 (r )， 

Qfis x\ r d ，i = 1 ， 2, •…， _ 1, 仏乏 0, and 

Q ^ Q\ + - + Qk^i + Qk* Then … ，仏 〜,， 仏 are 
independent and Q k is x 2 ( r 一 !) 
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